OMNI-VIS-ASSIST

OMNI-VIS-ASSIST is an advanced multimodal instruction-following AI assistant that interprets both images and text prompts to generate detailed, structured, and insightful explanations.

🚀 Features

Understands visual + textual input
Performs image captioning, chart summarization, and visual reasoning
Converts image content into Markdown, tables, and Mermaid flowcharts
Works with large multimodal models (Qwen3-VL, BLIP, etc.) or fallback captioners

🧠 Example usage

python inference.py --image examples/sample_image.png --prompt "Explain the chart in this image."

⚙️ Installation

pip install -r requirements.txt

🧩 License

Apache-2.0

Downloads last month: 22

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hmnshudhmn24/omni-vis-assist

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(40)

this model