Sirui Zhang's picture

23 13

Sirui Zhang

zsr200901

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

DeepSeek-OCR: Contexts Optical Compression

upvoted a paper 4 days ago

RL makes MLLMs see better than SFT

updated a collection about 1 month ago

View all activity

Organizations

upvoted 2 papers 4 days ago

DeepSeek-OCR: Contexts Optical Compression

Paper • 2510.18234 • Published 7 days ago • 59

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published 10 days ago • 44

updated a collection about 1 month ago

VLA

2 items • Updated Sep 15

upvoted a paper about 2 months ago

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28 • 109

upvoted 4 papers 3 months ago

The Promise of RL for Autoregressive Image Editing

Paper • 2508.01119 • Published Aug 1 • 11

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 236

EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29 • 8

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Paper • 2507.23785 • Published Jul 31 • 18

upvoted a collection 3 months ago

ReasonGen-R1

Model and Datasets for the paper "ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL • 7 items • Updated Jun 2 • 6

upvoted 3 papers 3 months ago

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Paper • 2507.22058 • Published Jul 29 • 38

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Paper • 2507.23779 • Published Jul 31 • 44

Pixels, Patterns, but No Poetry: To See The World like Humans

Paper • 2507.16863 • Published Jul 21 • 68

upvoted a paper 6 months ago

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 76

upvoted 5 papers 7 months ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 123

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62

Tokenize Image as a Set

Paper • 2503.16425 • Published Mar 20 • 16

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20 • 34

Equivariant Image Modeling

Paper • 2503.18948 • Published Mar 24 • 15

upvoted 2 papers 8 months ago

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published Feb 27 • 31

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published Feb 25 • 37