Update README.md
Browse files
README.md
CHANGED
|
@@ -10,10 +10,7 @@ datasets:
|
|
| 10 |
pipeline_tag: visual-question-answering
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
| 14 |
-
<p align="center">
|
| 15 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/X8AXMkOlKeUpNcoJIXKna.webp" alt="Image Description" width="300" height="300">
|
| 16 |
-
</p>
|
| 17 |
|
| 18 |
[\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
|
| 19 |
|
|
@@ -40,18 +37,6 @@ InternVL-Chat-V1-2-Plus uses the same model architecture as [InternVL-Chat-V1-2]
|
|
| 40 |
- Learnable Component: ViT + MLP + LLM
|
| 41 |
- Data: 12 million SFT samples.
|
| 42 |
|
| 43 |
-
## Released Models
|
| 44 |
-
|
| 45 |
-
| Model | Vision Foundation Model | Release Date |Note |
|
| 46 |
-
| :---------------------------------------------------------:|:--------------------------------------------------------------------------: |:----------------------:| :---------------------------------- |
|
| 47 |
-
| InternVL-Chat-V1-5(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)) | InternViT-6B-448px-V1-5(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5)) |2024.04.18 | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (π₯new)|
|
| 48 |
-
| InternVL-Chat-V1-2-Plus(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) ) |InternViT-6B-448px-V1-2(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) |2024.02.21 | more SFT data and stronger |
|
| 49 |
-
| InternVL-Chat-V1-2(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) ) |InternViT-6B-448px-V1-2(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) |2024.02.11 | scaling up LLM to 34B |
|
| 50 |
-
| InternVL-Chat-V1-1(π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)) |InternViT-6B-448px-V1-0(π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0)) |2024.01.24 | support Chinese and stronger OCR |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
## Performance
|
| 56 |
|
| 57 |
\* Proprietary Model β Training Set Observed
|
|
@@ -153,9 +138,3 @@ If you find this project useful in your research, please consider citing:
|
|
| 153 |
## License
|
| 154 |
|
| 155 |
This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.
|
| 156 |
-
|
| 157 |
-
Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
|
| 158 |
-
|
| 159 |
-
## Acknowledgement
|
| 160 |
-
|
| 161 |
-
InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
|
|
|
|
| 10 |
pipeline_tag: visual-question-answering
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# InternVL-Chat-V1-2-Plus
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
[\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
|
| 16 |
|
|
|
|
| 37 |
- Learnable Component: ViT + MLP + LLM
|
| 38 |
- Data: 12 million SFT samples.
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
## Performance
|
| 41 |
|
| 42 |
\* Proprietary Model β Training Set Observed
|
|
|
|
| 138 |
## License
|
| 139 |
|
| 140 |
This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|