Qwen
/

Qwen2-7B-Instruct-GPTQ-Int4

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

hzhwcmhf commited on Jun 6, 2024

Commit

5a0a013

·

verified ·

1 Parent(s): 7ef69c9

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -18,6 +18,8 @@ Compared with the state-of-the-art opensource language models, including the pre
 Qwen2-7B-Instruct-GPTQ-Int4 supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs. Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2 for handling long texts.
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/) and [GitHub](https://github.com/QwenLM/Qwen2).
 <br>
 ## Model Details

 Qwen2-7B-Instruct-GPTQ-Int4 supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs. Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2 for handling long texts.
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/) and [GitHub](https://github.com/QwenLM/Qwen2).
+**Note**: If you encounter ``RuntimeError: probability tensor contains either `inf`, `nan` or element < 0`` during inference with ``transformer``, we recommand installing ``autogpq>=0.7.1`` or [deploying this model with vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html).
 <br>
 ## Model Details