Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published 7 days ago • 60
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published 6 days ago • 35
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training Paper • 2510.15859 • Published 10 days ago • 10
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 14 days ago • 158
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling Paper • 2510.01329 • Published 26 days ago • 5
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving Paper • 2509.20109 • Published Sep 24 • 3
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 98
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper • 2509.07969 • Published Sep 9 • 59
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 122
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published Aug 27 • 30
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 142
Dream-Coder 7B Collection https://hkunlp.github.io/blog/2025/dream-coder • 2 items • Updated Jul 15 • 5
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization Paper • 2508.05731 • Published Aug 7 • 25
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published Jul 31 • 23