Papers
arxiv:2509.08721

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Published on Sep 10
· Submitted by Ben on Sep 10
#1 Paper of the day
Authors:
,
,
,
,

Abstract

Swarm sAmpling Policy Optimization (SAPO) is a decentralized and asynchronous RL algorithm that enhances post-training language models without supervised fine-tuning, achieving significant reward gains and scalability across diverse hardware.

AI-generated summary

Post-training language models (LMs) with reinforcement learning (RL) can enhance their complex reasoning capabilities without supervised fine-tuning, as demonstrated by DeepSeek-R1-Zero. However, effectively utilizing RL for LMs requires significant parallelization to scale-up inference, which introduces non-trivial technical challenges (e.g. latency, memory, and reliability) alongside ever-growing financial costs. We present Swarm sAmpling Policy Optimization (SAPO), a fully decentralized and asynchronous RL post-training algorithm. SAPO is designed for decentralized networks of heterogenous compute nodes, where each node manages its own policy model(s) while "sharing" rollouts with others in the network; no explicit assumptions about latency, model homogeneity, or hardware are required and nodes can operate in silo if desired. As a result, the algorithm avoids common bottlenecks in scaling RL post-training while also allowing (and even encouraging) new possibilities. By sampling rollouts "shared" across the network, it enables "Aha moments" to propagate, thereby bootstrapping the learning process. In this paper we show SAPO achieved cumulative reward gains of up to 94% in controlled experiments. We also share insights from tests on a network with thousands of nodes contributed by Gensyn community members running the algorithm on diverse hardware and models during an open-source demo.

Community

Paper author Paper submitter

We introduce SAPO (Swarm sAmpling Policy Optimization) - a decentralised RL post-training method where models share experiences to learn faster, together.

The problem: Scaling RL for LMs is costly and fragile.

Clusters must stay in sync, communication bottlenecks grow, and infrastructure overhead skyrockets.

SAPO flips the model - instead of syncing weights, nodes share decoded rollouts. Lightweight, async, and resilient.

Why it matters:
– No synchronisation overhead
– Works across heterogeneous devices (servers, laptops, anything)
– “Aha moments” on one node propagate through the swarm
– Opens RL post-training to maximum scale

Results:
– Controlled experiments saw 94% reward improvement over baseline with balanced sharing (4 local / 4 external)
– Thousands of community nodes validated SAPO in a live demo
– Collective training = faster, stronger learning

SAPO shows that sharing experience beats scaling alone.

Decentralised communities of models — and people — can push reasoning further than any single system.

Participate in future research by running an RL Swarm node on your own hardware: https://github.com/gensyn-ai/rl-swarm

·

We work harder because we trust the team.

Interesting how SAPO addresses latency and synchronization in distributed RL for LMs.

·

The idea of “sharing rollouts” seems like a simple way to scale without huge infrastructure costs.

The decentralized and asynchronous approach looks promising for heterogeneous hardware.

·

we trusted

The possibility of spreading “Aha moments” between nodes is a cool idea that could speed up learning.

Achieving up to 94% reward improvement in controlled settings is impressive.

Curious how SAPO handles heterogeneous hardware different nodes with different capacities.

·

Innovative approach to decentralized RL post-training.

gensyn have solid team crazy work lfg

I don't understand too much but looks good let's goo

Sapo is way to goooo

don't know at all what is happening here, but I definitely know that the team is going to do something great

Solid work team! LFG

Too many robots in a single post

·
Paper author

We have an enthusiastic + open community (who contributed to this research by helping us scale the experiments in a fully open + collaborative way) - likely not bots but participants.

Agree that unsubstantial comments drown out the interesting discussion though, sadly.

i believe the team

This looks high-tech!

Really exciting work sapo is decentralized approach tackles the scalability bottlenecks in RL post-training head-on. The idea of propagating ‘aha moments’ across heterogeneous nodes feels like a big step toward more open and efficient collective model improvement.

great team..
nice job..

Really enjoying the testnet so far! The idea of sharing rollouts across nodes feels very natural, and it’s exciting to see how well SAPO scales with different hardware setups.

It's very exciting to see AIs take off like this, and I'm very happy to see Gensyn achieve this

team is working very good, keep going

We will find the heart of artificial intelligence with Gensyn. Great work. Thank you.

Great things are happening, the team worked really well.

LFG team

We work harder because we trust the team.

W paper

Wen AMA ?

more work more models more ai and always gensyn

Just finished reading. Honestly, that chart showing Qwen2.5 0.5b with sapo vs isolated is mind blowing. I helped train it 😎

i believe the team too

hardworking and successful team.

we support the team and their efforts

Bots army floods the comment section. The comment section is supposed to hold discussions of the papar itself, not anything else.

·
Paper author

Commented elsewhere but just a note that this work was done as a huge open collaboration with many participants volunteering their time, effort, and devices.

Our community lives here and is open to anyone to join!

As an AI enthusiast, I find this approach really exciting. The idea of models improving by sharing their experiences across different machines feels like a big step toward more collaborative and scalable AI.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

I went through this paper and it’s really interesting for Gensyn. The idea of sharing rollouts instead of syncing models fits perfectly with community-driven compute. It’s smart because it lets people with different hardware contribute without much coordination. The demo with thousands of nodes shows it’s working in real life, which is great.
However, trust and privacy are big challenges — the system needs ways to filter bad data and protect sensitive info. Also, it would be good to see more tests across tasks and models. Overall, this approach can help Gensyn scale better if safety and reward structures are handled well.

Really cool to see SAPO in action with real community hardware! Love that it works even when nodes are slow or go offline. Feels like a step toward truly open and resilient RL training. Big fan of the "Aha moments" idea - almost like the network is learning together, not just scaling up. LFG

sapo in action

SAPO is a game-changer for LLMs

In gensyn's experiments, SAPO made the machines 94% better at earning rewards, It’s like a boost button for learning. They shared cool insights from a big open-source demo where everyone pitched in. This proves SAPO can handle a huge set of machines, making AI training cheaper and faster.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.08721 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.08721 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.08721 in a Space README.md to link it from this page.

Collections including this paper 16