Engage in multimedia chat with LLMs and ML models
Read text from images using scene text recognition models