-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 56 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 26
Xijia Tao
Cie1
AI & ML interests
Multimodal tool-calling agents, Diffusion large language models
Recent Activity
upvoted
a
paper
about 10 hours ago
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
liked
a model
7 days ago
Dream-org/Dream-VLA-7B
liked
a model
7 days ago
Dream-org/Dream-VL-7B
Organizations
None yet