3 20 3

Haoze Wu

WaitHZ

https://waithz.github.io/

AI & ML interests

Modular DL, Complex Reasoning

Recent Activity

upvoted a paper 21 days ago

InnoGym: Benchmarking the Innovation Potential of AI Agents

upvoted a paper 21 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

upvoted a paper 24 days ago

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

View all activity

Organizations

upvoted 2 papers 21 days ago

InnoGym: Benchmarking the Innovation Potential of AI Agents

Paper • 2512.01822 • Published 24 days ago • 34

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 23 days ago • 229

upvoted a paper 24 days ago

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published 28 days ago • 79

upvoted 2 papers about 2 months ago

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Paper • 2509.25123 • Published Sep 29 • 20

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29 • 45

upvoted a paper 2 months ago

LightMem: Lightweight and Efficient Memory-Augmented Generation

Paper • 2510.18866 • Published Oct 21 • 110

upvoted a collection 3 months ago

DeepSeek-V3.2

Collection

4 items • Updated 24 days ago • 509

upvoted a paper 3 months ago

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published Sep 8 • 79

upvoted a paper 4 months ago

Model-Task Alignment Drives Distinct RL Outcomes

Paper • 2508.21188 • Published Aug 28 • 8

upvoted a paper 10 months ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Paper • 2503.04598 • Published Mar 6 • 21

upvoted an article 10 months ago

Article

Open-R1: Update #1

Feb 2

•

305

upvoted a paper 10 months ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published Feb 11 • 10

upvoted 2 articles 11 months ago

Article

How to generate text: using different decoding methods for language generation with Transformers

Mar 1, 2020

•

275

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

423

upvoted 2 papers 11 months ago

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 48

Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22 • 44

upvoted 4 papers over 1 year ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107

Haoze Wu

AI & ML interests

Recent Activity

Organizations

WaitHZ's activity

Open-R1: Update #1

How to generate text: using different decoding methods for language generation with Transformers

You could have designed state of the art positional encoding