feihu.hf
		
	commited on
		
		
					Commit 
							
							·
						
						cee317b
	
1
								Parent(s):
							
							89109db
								
update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -18,13 +18,12 @@ tags: | |
| 18 |  | 
| 19 | 
             
            ## Introduction
         | 
| 20 |  | 
| 21 | 
            -
            Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers | 
| 22 |  | 
| 23 | 
             
            - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
         | 
| 24 | 
             
            - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
         | 
| 25 | 
             
            - **Long-context Support** up to 128K tokens.
         | 
| 26 |  | 
| 27 | 
            -
             | 
| 28 | 
             
            **This repo contains the 7B Qwen2.5-Coder model**, which has the following features:
         | 
| 29 | 
             
            - Type: Causal Language Models
         | 
| 30 | 
             
            - Training Stage: Pretraining
         | 
| @@ -34,8 +33,8 @@ Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models ( | |
| 34 | 
             
            - Number of Layers: 28
         | 
| 35 | 
             
            - Number of Attention Heads (GQA): 28 for Q and 4 for KV
         | 
| 36 | 
             
            - Context Length: Full 131,072 tokens
         | 
| 37 | 
            -
             | 
| 38 | 
            -
             | 
| 39 | 
             
            **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., or fill in the middle tasks on this model.
         | 
| 40 |  | 
| 41 | 
             
            For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), [Documentation](https://qwen.readthedocs.io/en/latest/), [Arxiv](https://arxiv.org/abs/2409.12186).
         | 
| @@ -66,9 +65,9 @@ For supported frameworks, you could add the following to `config.json` to enable | |
| 66 | 
             
            }
         | 
| 67 | 
             
            ```
         | 
| 68 |  | 
| 69 | 
            -
            For deployment, we recommend using vLLM.
         | 
| 70 | 
             
            Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
         | 
| 71 | 
            -
            Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
         | 
| 72 | 
             
            We advise adding the `rope_scaling` configuration only when processing long contexts is required.
         | 
| 73 |  | 
| 74 | 
             
            ## Evaluation & Performance
         | 
| @@ -83,10 +82,10 @@ If you find our work helpful, feel free to give us a cite. | |
| 83 |  | 
| 84 | 
             
            ```
         | 
| 85 | 
             
            @article{hui2024qwen2,
         | 
| 86 | 
            -
             | 
| 87 | 
            -
             | 
| 88 | 
            -
             | 
| 89 | 
            -
             | 
| 90 | 
             
            }
         | 
| 91 | 
             
            @article{qwen2,
         | 
| 92 | 
             
                  title={Qwen2 Technical Report}, 
         | 
|  | |
| 18 |  | 
| 19 | 
             
            ## Introduction
         | 
| 20 |  | 
| 21 | 
            +
            Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
         | 
| 22 |  | 
| 23 | 
             
            - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
         | 
| 24 | 
             
            - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
         | 
| 25 | 
             
            - **Long-context Support** up to 128K tokens.
         | 
| 26 |  | 
|  | |
| 27 | 
             
            **This repo contains the 7B Qwen2.5-Coder model**, which has the following features:
         | 
| 28 | 
             
            - Type: Causal Language Models
         | 
| 29 | 
             
            - Training Stage: Pretraining
         | 
|  | |
| 33 | 
             
            - Number of Layers: 28
         | 
| 34 | 
             
            - Number of Attention Heads (GQA): 28 for Q and 4 for KV
         | 
| 35 | 
             
            - Context Length: Full 131,072 tokens
         | 
| 36 | 
            +
              - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
         | 
| 37 | 
            +
              
         | 
| 38 | 
             
            **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., or fill in the middle tasks on this model.
         | 
| 39 |  | 
| 40 | 
             
            For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), [Documentation](https://qwen.readthedocs.io/en/latest/), [Arxiv](https://arxiv.org/abs/2409.12186).
         | 
|  | |
| 65 | 
             
            }
         | 
| 66 | 
             
            ```
         | 
| 67 |  | 
| 68 | 
            +
            For deployment, we recommend using vLLM. 
         | 
| 69 | 
             
            Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
         | 
| 70 | 
            +
            Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. 
         | 
| 71 | 
             
            We advise adding the `rope_scaling` configuration only when processing long contexts is required.
         | 
| 72 |  | 
| 73 | 
             
            ## Evaluation & Performance
         | 
|  | |
| 82 |  | 
| 83 | 
             
            ```
         | 
| 84 | 
             
            @article{hui2024qwen2,
         | 
| 85 | 
            +
                  title={Qwen2. 5-Coder Technical Report},
         | 
| 86 | 
            +
                  author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others},
         | 
| 87 | 
            +
                  journal={arXiv preprint arXiv:2409.12186},
         | 
| 88 | 
            +
                  year={2024}
         | 
| 89 | 
             
            }
         | 
| 90 | 
             
            @article{qwen2,
         | 
| 91 | 
             
                  title={Qwen2 Technical Report}, 
         | 
