OpenGVLab
/

InternVL-Chat-V1-1

Image-Text-to-Text

feature-extraction

Model card Files Files and versions

czczup commited on Feb 3, 2024

Commit

d1a7f0a

·

verified ·

1 Parent(s): f0bf9a7

Update README.md

Files changed (1) hide show

README.md +30 -2

README.md CHANGED Viewed

@@ -43,10 +43,38 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
 We will provide a minimum code example to run InternVL-Chat using only the `transformers` library.
-Before this is completed, you can use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.
 ```python
-TODO
 ```
 ## Examples

 We will provide a minimum code example to run InternVL-Chat using only the `transformers` library.
+You also can use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.
 ```python
+import torch
+from PIL import Image
+from transformers import AutoModel, CLIPImageProcessor
+from transformers import AutoTokenizer
+path = "OpenGVLab/InternVL-Chat-Chinese-V1-1"
+model = AutoModel.from_pretrained(
+    path,
+    torch_dtype=torch.bfloat16,
+    low_cpu_mem_usage=True,
+    trust_remote_code=True,
+    device_map='auto').eval()
+tokenizer = AutoTokenizer.from_pretrained(path)
+image = Image.open('./examples/image2.jpg').convert('RGB')
+image = image.resize((448, 448))
+image_processor = CLIPImageProcessor.from_pretrained(path)
+pixel_values = image_processor(images=image, return_tensors='pt').pixel_values
+pixel_values = pixel_values.to(torch.bfloat16).cuda()
+generation_config = dict(
+    num_beams=1,
+    max_new_tokens=512,
+    do_sample=False,
+)
+question = "请详细描述图片"
+response = model.chat(tokenizer, pixel_values, question, generation_config)
 ```
 ## Examples