Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
Paper
• 2512.24271
• Published
• 63
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
Paper
• 2512.24724
• Published
• 7
Pretraining Frame Preservation in Autoregressive Video Memory Compression
Paper
• 2512.23851
• Published
• 25
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation
Paper
• 2512.24551
• Published
• 21
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
Paper
• 2512.22905
• Published
• 20
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
Paper
• 2512.24385
• Published
• 8
Factorized Learning for Temporally Grounded Video-Language Models
Paper
• 2512.24097
• Published
• 7
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling
Paper
• 2512.23162
• Published
• 13
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web
Paper
• 2512.23044
• Published
• 10
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation
Paper
• 2512.21734
• Published
• 5