Upload folder using huggingface_hub
Browse files
    	
        README.md
    CHANGED
    
    | @@ -6,7 +6,7 @@ pipeline_tag: text-generation | |
| 6 | 
             
            <div align="center">
         | 
| 7 | 
             
            <h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
         | 
| 8 |  | 
| 9 | 
            -
            <a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/">[Data&Code]</a>
         | 
| 10 | 
             
            </div>
         | 
| 11 |  | 
| 12 | 
             
            We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K long-context training data synthesized from GPT-4. The entire training cycle is super efficient, which takes 8 hours on a 8xA800 (80G) machine. Yet, the resulted model achieves remarkable performance on a series of downstream long-context evaluation benchmarks.
         | 
| @@ -14,7 +14,7 @@ We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K | |
| 14 |  | 
| 15 | 
             
            # Evaluation
         | 
| 16 |  | 
| 17 | 
            -
            All the following evaluation results can be reproduced following instructions [here](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/ | 
| 18 |  | 
| 19 | 
             
            ## Needle in a Haystack
         | 
| 20 | 
             
            We evaluate the model on the Needle-In-A-HayStack task using the official setting. The blue vertical line indicates the training context length, i.e. 80K. 
         | 
|  | |
| 6 | 
             
            <div align="center">
         | 
| 7 | 
             
            <h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
         | 
| 8 |  | 
| 9 | 
            +
            <a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora">[Data&Code]</a>
         | 
| 10 | 
             
            </div>
         | 
| 11 |  | 
| 12 | 
             
            We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K long-context training data synthesized from GPT-4. The entire training cycle is super efficient, which takes 8 hours on a 8xA800 (80G) machine. Yet, the resulted model achieves remarkable performance on a series of downstream long-context evaluation benchmarks.
         | 
|  | |
| 14 |  | 
| 15 | 
             
            # Evaluation
         | 
| 16 |  | 
| 17 | 
            +
            All the following evaluation results can be reproduced following instructions [here](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora).
         | 
| 18 |  | 
| 19 | 
             
            ## Needle in a Haystack
         | 
| 20 | 
             
            We evaluate the model on the Needle-In-A-HayStack task using the official setting. The blue vertical line indicates the training context length, i.e. 80K. 
         |