wanzhenchn
/

Qwen2.5-VL-7B-Instruct-gptqmodel-int8

Image-Text-to-Text

text-generation-inference

8-bit precision

Model card Files Files and versions

wanzhenchn commited on Apr 9

Commit

583dacf

·

verified ·

1 Parent(s): 863a5a6

Update README.md

Files changed (1) hide show

README.md +101 -3

README.md CHANGED Viewed

@@ -1,3 +1,101 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+- zh
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
+pipeline_tag: image-text-to-text
+library_name: transformers
+tags:
+- text-generation-inference
+---
+# Qwen2.5-VL-7B-Instruct-gptqmodel-int8
+It is a GPTQ-INT8 quantized [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) with [GPTQModel](https://github.com/ModelCloud/GPTQModel) toolkit.
+## How to quantize
+### Install
+```bash
+# Python 3.10.x or above
+pip3 install -v "gptqmodel>=2.2.0" --no-build-isolation
+```
+### Quantize
+```bash
+python3 gptqmodel_quantize.py /path/to/Qwen2.5-VL-7B-Instruct/ /path/to/Qwen2.5-VL-7B-Instruct-gptqmodel-int8 8
+```
+```python
+# gptqmodel_quantize.py
+import fire
+from datasets import load_dataset
+from gptqmodel import GPTQModel, QuantizeConfig
+from gptqmodel.models.definitions.base_qwen2_vl import BaseQwen2VLGPTQ
+os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
+os.environ["PYTHONUTF8"]="1"
+def format_qwen2_vl_dataset(image, assistant):
+    return [
+        {
+            "role": "user",
+            "content": [
+                {"type": "image", "image": image},
+                {"type": "text", "text": "generate a caption for this image"},
+            ],
+        },
+        {"role": "assistant", "content": assistant},
+    ]
+def prepare_dataset(format_func, n_sample: int = 20) -> list[list[dict]]:
+    from datasets import load_dataset
+    dataset = load_dataset(
+        "laion/220k-GPT4Vision-captions-from-LIVIS", split=f"train[:{n_sample}]"
+    )
+    return [
+        format_func(sample["url"], sample["caption"])
+        for sample in dataset
+    ]
+def get_calib_dataset(model):
+    if isinstance(model, BaseQwen2VLGPTQ):
+        return prepare_dataset(format_qwen2_vl_dataset, n_sample=256)
+    raise NotImplementedError(f"Unsupported MODEL: {model.__class__}")
+def quantize(model_path: str,
+             output_path: str,
+			 bit: int):
+    quant_config = QuantizeConfig(bits=bit, group_size=128)
+    model = GPTQModel.load(model_path, quant_config)
+    calibration_dataset = get_calib_dataset(model)
+    # increase `batch_size` to match gpu/vram specs to speed up quantization
+    model.quantize(calibration_dataset, batch_size=8)
+    model.save(output_path)
+    # test post-quant inference
+    model = GPTQModel.load(output_path)
+    result = model.generate("Uncovering deep insights begins with")[0] # tokens
+    print(model.tokenizer.decode(result)) # string output
+if __name__ == "__main__":
+    fire.Fire(quantize)
+```