Ming Chen

ChenMing-thu14

AI & ML interests

3D Human Pose Estimation

Recent Activity

upvoted a paper 4 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

upvoted a paper 5 days ago

StreamingVLM: Real-Time Understanding for Infinite Video Streams

upvoted a paper 5 days ago

RL makes MLLMs see better than SFT

View all activity

Organizations

None yet

upvoted a paper 4 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published 5 days ago • 17

upvoted 3 papers 5 days ago

authored a paper 5 days ago

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Paper • 2509.09595 • Published Sep 11 • 48

upvoted 2 papers 7 days ago

Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

Paper • 2510.16258 • Published 10 days ago • 6

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published 8 days ago • 54

upvoted 3 papers 8 days ago

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published 13 days ago • 141

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published 10 days ago • 77

Latent Diffusion Model without Variational Autoencoder

Paper • 2510.15301 • Published 11 days ago • 45

upvoted 2 papers 11 days ago

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

Paper • 2510.14976 • Published 11 days ago • 3

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published 11 days ago • 64

upvoted a paper 12 days ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published 13 days ago • 16

upvoted a paper 13 days ago

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

Paper • 2510.05684 • Published 20 days ago • 132

upvoted a paper 17 days ago

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published 18 days ago • 66

upvoted a paper 28 days ago

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Paper • 2509.25161 • Published 28 days ago • 23

upvoted a paper 29 days ago

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 176

upvoted 3 papers about 1 month ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 95

Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 132

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published Sep 22 • 65

Ming Chen

AI & ML interests

Recent Activity

Organizations

ChenMing-thu14's activity