OMNI-VIS-ASSIST
OMNI-VIS-ASSIST is an advanced multimodal instruction-following AI assistant that interprets both images and text prompts to generate detailed, structured, and insightful explanations.
π Features
- Understands visual + textual input
- Performs image captioning, chart summarization, and visual reasoning
- Converts image content into Markdown, tables, and Mermaid flowcharts
- Works with large multimodal models (Qwen3-VL, BLIP, etc.) or fallback captioners
π§ Example usage
python inference.py --image examples/sample_image.png --prompt "Explain the chart in this image."
βοΈ Installation
pip install -r requirements.txt
π§© License
Apache-2.0
- Downloads last month
- 22
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for hmnshudhmn24/omni-vis-assist
Base model
Qwen/Qwen3-VL-4B-Instruct