OpenGVLab
/

InternVL-Chat-V1-2-Plus

@@ -10,10 +10,7 @@ datasets:
 pipeline_tag: visual-question-answering
 ---
-# Model Card for InternVL-Chat-V1-2-Plus
-<p align="center">
-  <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/X8AXMkOlKeUpNcoJIXKna.webp" alt="Image Description" width="300" height="300">
-</p>
 [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)  [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)
@@ -40,18 +37,6 @@ InternVL-Chat-V1-2-Plus uses the same model architecture as [InternVL-Chat-V1-2]
     - Learnable Component: ViT + MLP + LLM
     - Data: 12 million SFT samples.
-## Released Models
-| Model                                                      | Vision Foundation Model                                                     | Release Date           |Note                                |
-| :---------------------------------------------------------:|:--------------------------------------------------------------------------: |:----------------------:| :---------------------------------- |
-| InternVL-Chat-V1-5(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5))      | InternViT-6B-448px-V1-5(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5))    |2024.04.18       |          support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
-| InternVL-Chat-V1-2-Plus(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) ) |InternViT-6B-448px-V1-2(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2))    |2024.02.21     |        more SFT data and stronger  |
-| InternVL-Chat-V1-2(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) )      |InternViT-6B-448px-V1-2(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2))     |2024.02.11       |             scaling up LLM to 34B       |
-| InternVL-Chat-V1-1(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1))      |InternViT-6B-448px-V1-0(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0))    |2024.01.24         |   support Chinese and stronger OCR   |
 ## Performance
 \* Proprietary Model   &nbsp;&nbsp;&nbsp;&nbsp;   † Training Set Observed
@@ -153,9 +138,3 @@ If you find this project useful in your research, please consider citing:
 ## License
 This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.
-Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
-## Acknowledgement
-InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!

 pipeline_tag: visual-question-answering
 ---
+# InternVL-Chat-V1-2-Plus
 [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)  [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)
     - Learnable Component: ViT + MLP + LLM
     - Data: 12 million SFT samples.
 ## Performance
 \* Proprietary Model   &nbsp;&nbsp;&nbsp;&nbsp;   † Training Set Observed
 ## License
 This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.