Shiyunee commited on 9 days ago

Commit

13da99b

verified ·

1 Parent(s): a089290

Batch upload 2/2

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_6k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_6k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/lora_epoch_best/README.md +206 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json +44 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors +3 -0
lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
mlp/greedy_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/best-checkpoint/vector_head_epoch_best.pt +3 -0
mlp/greedy_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/test_losses.jsonl +10 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/best-checkpoint/vector_head_epoch_best.pt +3 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/test_losses.jsonl +10 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_200k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_200k_training_samples/test_losses.jsonl +15 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_20k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_20k_training_samples/test_losses.jsonl +15 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_30k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_30k_training_samples/test_losses.jsonl +15 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_50k_training_samples/best-checkpoint/vector_head_epoch_best.pt +3 -0
mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_50k_training_samples/test_losses.jsonl +15 -0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "down_proj",
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "q_proj",
+    "k_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c692817440b9b4c146bee2fbc46c393c682b87bcd53b92d2e9299c8e4220fa73
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_r8_alpha16_loradrpout0.0/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebe12a6076262b89a2889c4226d13b796d57033958d504a3021d853cf2bfb1bb
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "v_proj",
+    "up_proj",
+    "gate_proj",
+    "down_proj",
+    "q_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:578bea544ff97879ee370c264b9a983f7055abee6b5d91834ad8a22c36f175aa
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_20k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c73d66a1887a02cdd42c4183c040e814757ec074222deeeb7962fb6493fdf77b
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "o_proj",
+    "up_proj",
+    "gate_proj",
+    "k_proj",
+    "q_proj",
+    "down_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f98b96a9d6bb00423718274c53afcde746ca8fb3dde2de6293792bdf58a0ae41
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_30k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2b441fa1e5f22615740f536d70718923ac63cb3bf5f7be5262f1cb886a9bd34e
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "q_proj",
+    "down_proj",
+    "up_proj",
+    "k_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c094b48560a5c44d9bfb5754f7aad5a9a9caba5d80cc079ba14fe5b1146cfa24
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_50k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d78d3558fb5c0ee087923a153509f4527f45e761b9f1ddfc9f39af3a33d0a6d9
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "k_proj",
+    "q_proj",
+    "up_proj",
+    "down_proj",
+    "o_proj",
+    "gate_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:90c9857b7e1fcff396c22aa069c65d77bdc230a3da815590ca34738ebfc393f1
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_r8_alpha16_loradrpout0.0_80k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c28df59ade44e6d5f9d504aa28862620828792cc6b116c2cb875c4d9f520f73
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "v_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb68937b55662867f1653aa67911d1929684db752c6bb887cd94563630c5ae42
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_10k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ac97dec0553f9de718a7ab362f9494ad0be296e7b51748bd8bc52c382adc3d4
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "k_proj",
+    "v_proj",
+    "q_proj",
+    "down_proj",
+    "up_proj",
+    "o_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9480d823e51db9e7fdce9caa1358b488d374d628fb3ffedb7998fedfe402de6
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_1k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2ab97c89a5035567d0570ee367afd72d7a4264bd15de8b95e9cf684bb8802a0
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "k_proj",
+    "down_proj",
+    "o_proj",
+    "gate_proj",
+    "v_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d40a1c980cc25dea3c0885db4801b4708704aaef62974c19ac0b96086e161a2
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_4k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:317c7b5a62777651ffc22ffff7ceba45844c73c57decad93fe82e3f9bd4f00a2
+size 16050

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_6k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_6k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "k_proj",
+    "down_proj",
+    "gate_proj",
+    "up_proj",
+    "q_proj",
+    "o_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/lora_epoch_best/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: /mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+library_name: peft
+tags:
+- base_model:adapter:/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/lora_epoch_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Qwen2Model",
+    "parent_library": "transformers.models.qwen2.modeling_qwen2"
+  },
+  "base_model_name_or_path": "/mnt/bn/motor-nlp-team/models/LLM/base_models/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "k_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "q_proj",
+    "o_proj"
+  ],
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/lora_epoch_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ccabfe57066670ea68ed7dceb65542a570240ed33f0943daa6674207264ddc0e
+size 80789744

lora/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs50_weightdecay0.1_r8_alpha16_loradrpout0.0_8k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5172d1277401d017a6ac86297a6bb977b2b7d834f1132957e51cc0da94cd58e4
+size 16050

mlp/greedy_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb1e68af5c6641004f895fac7e2cfb775fae6e6cefc93ffa1199e36eb6f9f072
+size 16050

mlp/greedy_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/test_losses.jsonl ADDED Viewed

	@@ -0,0 +1,10 @@

+{"epoch": 1, "test_loss": 0.1149066910147667}
+{"epoch": 2, "test_loss": 0.0958828404545784}
+{"epoch": 3, "test_loss": 0.09103523194789886}
+{"epoch": 4, "test_loss": 0.08869684487581253}
+{"epoch": 5, "test_loss": 0.08744823932647705}
+{"epoch": 6, "test_loss": 0.0866256132721901}
+{"epoch": 7, "test_loss": 0.08622399717569351}
+{"epoch": 8, "test_loss": 0.08594836294651031}
+{"epoch": 9, "test_loss": 0.08569417893886566}
+{"epoch": 10, "test_loss": 0.08563829213380814}

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2de9e756bb65078f057780f4b722a9cffcde119a7a57e248b55f5564a62e7495
+size 16050

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs10_weightdecay0.1_mlp_1_layer/test_losses.jsonl ADDED Viewed

	@@ -0,0 +1,10 @@

+{"epoch": 1, "test_loss": 0.10135433822870255}
+{"epoch": 2, "test_loss": 0.09748891741037369}
+{"epoch": 3, "test_loss": 0.097267746925354}
+{"epoch": 4, "test_loss": 0.09653977304697037}
+{"epoch": 5, "test_loss": 0.0959627628326416}
+{"epoch": 6, "test_loss": 0.09610576927661896}
+{"epoch": 7, "test_loss": 0.09564800560474396}
+{"epoch": 8, "test_loss": 0.09605950862169266}
+{"epoch": 9, "test_loss": 0.09578048437833786}
+{"epoch": 10, "test_loss": 0.09579289704561234}

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_200k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebc7d1c2be4d7ab97cff76828e4fccffadc52fd382a7fd2f3863cc169310e06b
+size 16050

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_200k_training_samples/test_losses.jsonl ADDED Viewed

	@@ -0,0 +1,15 @@

+{"epoch": 1, "test_loss": 0.1086520329117775}
+{"epoch": 2, "test_loss": 0.10104358941316605}
+{"epoch": 3, "test_loss": 0.09919017553329468}
+{"epoch": 4, "test_loss": 0.09786633402109146}
+{"epoch": 5, "test_loss": 0.0978197380900383}
+{"epoch": 6, "test_loss": 0.09747838973999023}
+{"epoch": 7, "test_loss": 0.09742674231529236}
+{"epoch": 8, "test_loss": 0.09723884612321854}
+{"epoch": 9, "test_loss": 0.09661206603050232}
+{"epoch": 10, "test_loss": 0.09651821106672287}
+{"epoch": 11, "test_loss": 0.09624306857585907}
+{"epoch": 12, "test_loss": 0.09612324833869934}
+{"epoch": 13, "test_loss": 0.09641342610120773}
+{"epoch": 14, "test_loss": 0.09636340290307999}
+{"epoch": 15, "test_loss": 0.096262626349926}

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_20k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bafc5dc6622c2fdca01e53eb9f8e083d2b25a9998ab61e247d09b62caf51cd92
+size 16050

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_20k_training_samples/test_losses.jsonl ADDED Viewed

	@@ -0,0 +1,15 @@

+{"epoch": 1, "test_loss": 0.13309381902217865}
+{"epoch": 2, "test_loss": 0.12217529863119125}
+{"epoch": 3, "test_loss": 0.11441642791032791}
+{"epoch": 4, "test_loss": 0.11019955575466156}
+{"epoch": 5, "test_loss": 0.1076476126909256}
+{"epoch": 6, "test_loss": 0.10716648399829865}
+{"epoch": 7, "test_loss": 0.10696237534284592}
+{"epoch": 8, "test_loss": 0.10518893599510193}
+{"epoch": 9, "test_loss": 0.10568984597921371}
+{"epoch": 10, "test_loss": 0.10488048940896988}
+{"epoch": 11, "test_loss": 0.1041213646531105}
+{"epoch": 12, "test_loss": 0.10402290523052216}
+{"epoch": 13, "test_loss": 0.10422961413860321}
+{"epoch": 14, "test_loss": 0.1038474440574646}
+{"epoch": 15, "test_loss": 0.10392986983060837}

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_30k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed24270282fc6109d0ec796a2810c0bf43f70037a70912403d2cee1f99096208
+size 16050

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_30k_training_samples/test_losses.jsonl ADDED Viewed

	@@ -0,0 +1,15 @@

+{"epoch": 1, "test_loss": 0.12959711253643036}
+{"epoch": 2, "test_loss": 0.11596647650003433}
+{"epoch": 3, "test_loss": 0.10890799760818481}
+{"epoch": 4, "test_loss": 0.10620059818029404}
+{"epoch": 5, "test_loss": 0.1056753620505333}
+{"epoch": 6, "test_loss": 0.10588867962360382}
+{"epoch": 7, "test_loss": 0.1045738235116005}
+{"epoch": 8, "test_loss": 0.1029469221830368}
+{"epoch": 9, "test_loss": 0.10293364524841309}
+{"epoch": 10, "test_loss": 0.10241842269897461}
+{"epoch": 11, "test_loss": 0.10193388909101486}
+{"epoch": 12, "test_loss": 0.10189251601696014}
+{"epoch": 13, "test_loss": 0.10223434120416641}
+{"epoch": 14, "test_loss": 0.10185457020998001}
+{"epoch": 15, "test_loss": 0.10189803689718246}

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_50k_training_samples/best-checkpoint/vector_head_epoch_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aca88c166c769fcca19290e125b76f67726f6b282eb3ab2a4706d9213e72e762
+size 16050

mlp/hybrid_answer_conf/long_qa/batchsize16_accumulation8_epochs15_weightdecay0.1_mlp_1_layer_50k_training_samples/test_losses.jsonl ADDED Viewed

	@@ -0,0 +1,15 @@

+{"epoch": 1, "test_loss": 0.12583285570144653}
+{"epoch": 2, "test_loss": 0.11041396111249924}
+{"epoch": 3, "test_loss": 0.1059633269906044}
+{"epoch": 4, "test_loss": 0.10363608598709106}
+{"epoch": 5, "test_loss": 0.10322992503643036}
+{"epoch": 6, "test_loss": 0.10201622545719147}
+{"epoch": 7, "test_loss": 0.10097873955965042}
+{"epoch": 8, "test_loss": 0.10107120871543884}
+{"epoch": 9, "test_loss": 0.10017844289541245}
+{"epoch": 10, "test_loss": 0.09985523670911789}
+{"epoch": 11, "test_loss": 0.10001035779714584}
+{"epoch": 12, "test_loss": 0.10001560300588608}
+{"epoch": 13, "test_loss": 0.09998462349176407}
+{"epoch": 14, "test_loss": 0.0996992364525795}
+{"epoch": 15, "test_loss": 0.09952692687511444}