Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 13 days ago • 47
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning Paper • 2510.10518 • Published 18 days ago • 17
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning Paper • 2510.13809 • Published 14 days ago • 36
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration Paper • 2510.10395 • Published 18 days ago • 28
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution Paper • 2510.08143 • Published 21 days ago • 20
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published 20 days ago • 67
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published 20 days ago • 62
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? Paper • 2509.03516 • Published Sep 3 • 11
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation Paper • 2508.19320 • Published Aug 26 • 28
VMoBA: Mixture-of-Block Attention for Video Diffusion Models Paper • 2506.23858 • Published Jun 30 • 31
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution Paper • 2506.19838 • Published Jun 24 • 13
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper • 2506.01943 • Published Jun 2 • 25
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Paper • 2505.22613 • Published May 28 • 8
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published May 27 • 38
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published May 22 • 24
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 76