RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation Paper • 2601.05241 • Published 1 day ago • 21
Act2Goal: From World Model To General Goal-conditioned Policy Paper • 2512.23541 • Published 12 days ago • 21
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry Paper • 2512.19629 • Published 19 days ago • 25
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference Paper • 2512.01031 • Published Nov 30, 2025 • 23
RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models Paper • 2404.04929 • Published Apr 7, 2024
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL Paper • 2406.05427 • Published Jun 8, 2024
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds Paper • 2502.20041 • Published Feb 27, 2025
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization Paper • 2506.03863 • Published Jun 4, 2025
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8, 2025 • 32 • 2
Hume: Introducing System-2 Thinking in Visual-Language-Action Model Paper • 2505.21432 • Published May 27, 2025 • 4
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28, 2025 • 77
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8, 2025 • 32
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8, 2025 • 32
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28, 2025 • 77
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7, 2025 • 47