OMNI-VIS-ASSIST

OMNI-VIS-ASSIST is an advanced multimodal instruction-following AI assistant that interprets both images and text prompts to generate detailed, structured, and insightful explanations.

πŸš€ Features

  • Understands visual + textual input
  • Performs image captioning, chart summarization, and visual reasoning
  • Converts image content into Markdown, tables, and Mermaid flowcharts
  • Works with large multimodal models (Qwen3-VL, BLIP, etc.) or fallback captioners

🧠 Example usage

python inference.py --image examples/sample_image.png --prompt "Explain the chart in this image."

βš™οΈ Installation

pip install -r requirements.txt

🧩 License

Apache-2.0

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hmnshudhmn24/omni-vis-assist

Finetuned
(40)
this model