Copy from mknolan/internvl25-image-analyzer
Browse files
README.md
CHANGED
|
@@ -1,10 +1,58 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
-
sdk:
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: InternVL2.5 Image Analyzer
|
| 3 |
+
emoji: 🖼️
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 3.50.0
|
| 8 |
+
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# InternVL2.5 Image Analyzer
|
| 13 |
+
|
| 14 |
+
This Hugging Face Space demonstrates the capabilities of the [InternVL2.5 model](https://huggingface.co/OpenGVLab/InternVL2_5-8B), a powerful multimodal model that can analyze images and respond to questions about them.
|
| 15 |
+
|
| 16 |
+
## Features
|
| 17 |
+
|
| 18 |
+
- Upload your own images for analysis
|
| 19 |
+
- Choose from predefined prompts or create your own
|
| 20 |
+
- Detailed image understanding and description
|
| 21 |
+
- Text recognition in images
|
| 22 |
+
- Visual reasoning capabilities
|
| 23 |
+
|
| 24 |
+
## Model Details
|
| 25 |
+
|
| 26 |
+
This space uses the InternVL2.5-8B model, which is a multimodal large language model (MLLM) with approximately 8.1 billion parameters. The model was developed by OpenGVLab and demonstrates strong capabilities in various visual understanding tasks.
|
| 27 |
+
|
| 28 |
+
### Architecture
|
| 29 |
+
|
| 30 |
+
InternVL2.5 combines a vision encoder (based on the InternViT architecture) with a language model, allowing it to process both visual and textual information.
|
| 31 |
+
|
| 32 |
+
## Example Prompts
|
| 33 |
+
|
| 34 |
+
Here are some prompts you can try:
|
| 35 |
+
|
| 36 |
+
1. Describe this image in detail.
|
| 37 |
+
2. What can you tell me about this image?
|
| 38 |
+
3. Is there any text in this image? If so, can you read it?
|
| 39 |
+
4. What is the main subject of this image?
|
| 40 |
+
5. What emotions or feelings does this image convey?
|
| 41 |
+
6. Describe the composition and visual elements of this image.
|
| 42 |
+
7. Summarize what you see in this image in one paragraph.
|
| 43 |
+
|
| 44 |
+
## Usage
|
| 45 |
+
|
| 46 |
+
1. Upload an image using the file uploader
|
| 47 |
+
2. Select a prompt from the dropdown or write your own
|
| 48 |
+
3. Click "Submit" to get the analysis
|
| 49 |
+
|
| 50 |
+
## Credits
|
| 51 |
+
|
| 52 |
+
This application uses the InternVL2.5 model by OpenGVLab. For more information about the model, check out:
|
| 53 |
+
- [OpenGVLab/InternVL Repository](https://github.com/OpenGVLab/InternVL)
|
| 54 |
+
- [InternVL Documentation](https://internvl.readthedocs.io/en/latest/)
|
| 55 |
+
|
| 56 |
+
## License
|
| 57 |
+
|
| 58 |
+
The InternVL2.5 model is licensed under the MIT License.
|