|
|
--- |
|
|
tags: |
|
|
- merge |
|
|
- task_wise |
|
|
- llm-adamerge |
|
|
base_model: deepseek-ai/deepseek-coder-7b-base-v1.5 |
|
|
--- |
|
|
|
|
|
# Merged Model using LLM-AdaMerge (task_wise) |
|
|
|
|
|
This model was created by merging multiple fine-tuned models using the LLM-AdaMerge approach with task_wise merging. |
|
|
|
|
|
## Merge Details |
|
|
|
|
|
- **Merge Type**: task_wise |
|
|
- **Base Model**: deepseek-ai/deepseek-coder-7b-base-v1.5 |
|
|
- **Number of Models Merged**: 2 |
|
|
- **Models Merged**: math, code |
|
|
- **Final Training Loss**: N/A |
|
|
- **Training Epochs**: 0 |
|
|
|
|
|
## Lambda Coefficients |
|
|
|
|
|
The following lambda coefficients were learned during training: |
|
|
|
|
|
|
|
|
Task-wise lambda coefficients are stored in the `learned_lambdas.json` file. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("your-username/model-name") |
|
|
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name") |
|
|
|
|
|
# Use the model |
|
|
inputs = tokenizer("Hello, how are you?", return_tensors="pt") |
|
|
outputs = model.generate(**inputs) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
``` |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
See the uploaded `training_config.json` file for detailed training configuration. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the LLM-AdaMerge paper: |
|
|
|
|
|
```bibtex |
|
|
@article{llmadamerge2024, |
|
|
title={LLM-AdaMerge: Adaptive Model Merging for Large Language Models}, |
|
|
author={...}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|