SpotEdit: Selective Region Editing in Diffusion Transformers Paper • 2512.22323 • Published 8 days ago • 36
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published 12 days ago • 29
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 30
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14, 2025 • 165
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published Nov 14, 2025 • 44
Parallel Loop Transformer for Efficient Test-Time Computation Scaling Paper • 2510.24824 • Published Oct 28, 2025 • 16
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published Oct 27, 2025 • 16
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets Paper • 2510.19944 • Published Oct 22, 2025 • 19
Trace Anything: Representing Any Video in 4D via Trajectory Fields Paper • 2510.13802 • Published Oct 15, 2025 • 30
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published Oct 15, 2025 • 25
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8, 2025 • 30
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16, 2025 • 43
VeriThinker: Learning to Verify Makes Reasoning Model Efficient Paper • 2505.17941 • Published May 23, 2025 • 25
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22, 2025 • 22
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20, 2025 • 133