openbmb
/

MiniCPM-V

@@ -3,11 +3,11 @@ pipeline_tag: text-generation
 ---
 ## MiniCPM-V
-**MiniCPM-V** (i.e., OmniLMM-3B)is an efficient version with promising performance for deployment. The model is built based on [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) and SigLip-400M, connected by a perceiver resampler. Notable features of MiniCPM-V include:
-- 🚀 **High Efficiency.**
-  MiniCPM-V can be **efficiently deployed on most GPU cards and personal computers**, and **even on edge devices such as mobile phones**. In terms of visual encoding, we compress the image representations into 64 tokens via a perceiver resampler, which is significantly fewer than other LMMs based on MLP architecture (typically > 512 tokens). This allows MiniCPM-V to operate with **much less memory cost and higher speed during inference**.
 - 🔥 **Promising Performance.**
@@ -15,7 +15,7 @@ pipeline_tag: text-generation
 - 🙌 **Bilingual Support.**
-  MiniCPM-V is **the first edge-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from our ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
 ### Evaluation
@@ -107,6 +107,9 @@ pipeline_tag: text-generation
 ## Demo
 Click here to try out the Demo of [MiniCPM-V](http://120.92.209.146:80).
 ## Usage
 Requirements:  tested on python 3.10
@@ -146,6 +149,7 @@ print(res)
 ## License
 #### Model License
 * The code in this repo is released according to [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE)
 * The usage of MiniCPM-V's parameters is subject to ["General Model License Agreement - Source Notes - Publicity Restrictions - Commercial License"](https://github.com/OpenBMB/General-Model-License/blob/main/)

 ---
 ## MiniCPM-V
+**MiniCPM-V** (i.e., OmniLMM-3B)is an efficient version with promising performance for deployment. The model is built based on SigLip-400M and [MiniCPM-2.4B](https://github.com/OpenBMB/MiniCPM/), connected by a perceiver resampler. Notable features of OmniLMM-3B include:
+- ⚡️ **High Efficiency.**
+  MiniCPM-V can be **efficiently deployed on most GPU cards and personal computers**, and **even on end devices such as mobile phones**. In terms of visual encoding, we compress the image representations into 64 tokens via a perceiver resampler, which is significantly fewer than other LMMs based on MLP architecture (typically > 512 tokens). This allows OmniLMM-3B to operate with **much less memory cost and higher speed during inference**.
 - 🔥 **Promising Performance.**
 - 🙌 **Bilingual Support.**
+  MiniCPM-V is **the first end-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from the ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
 ### Evaluation
 ## Demo
 Click here to try out the Demo of [MiniCPM-V](http://120.92.209.146:80).
+## Deployment on Mobile Phone
+Currently MiniCPM-V (i.e., OmniLMM-3B) can be deployed on mobile phones with Android and Harmony operating systems. 🚀 Try it out [here](https://github.com/OpenBMB/mlc-MiniCPM).
 ## Usage
 Requirements:  tested on python 3.10
 ## License
 #### Model License
 * The code in this repo is released according to [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE)
 * The usage of MiniCPM-V's parameters is subject to ["General Model License Agreement - Source Notes - Publicity Restrictions - Commercial License"](https://github.com/OpenBMB/General-Model-License/blob/main/)