Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
81
197
pascalmusabyimana
pascal-maker
Follow
SelmaNajih001's profile picture
shtefcs's profile picture
burtenshaw's profile picture
18 followers
·
87 following
https://pascal-maker.github.io/developedbypascalmusabyimana/
PascalMusabyim1
pascal-maker
pascal-musabyimana-573b66178
AI & ML interests
computer vision, nlp , machine learning and deeplearning
Recent Activity
reacted
to
Kseniase
's
post
with 🔥
about 19 hours ago
11 Fascinating new Policy Optimization techniques Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them: 1. BAlanced Policy Optimization (BAPO) → https://huggingface.co/papers/2510.18927 Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse 2. Training-Free GRPO → https://huggingface.co/papers/2510.08191 Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior 3. Asymmetric Importance Sampling Policy Optimization (ASPO) → https://huggingface.co/papers/2510.06062 Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable 4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519 Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence 5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270 Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively 6. Information Gain-based Policy Optimization (IGPO) → https://huggingface.co/papers/2510.14967 Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
liked
a model
3 days ago
meituan-longcat/LongCat-Flash-Omni
liked
a model
4 days ago
moonshotai/Kimi-Linear-48B-A3B-Instruct
View all activity
Organizations
pascal-maker
's Spaces
7
Sort: Recently updated
pinned
Paused
My Argilla
✍
Sleeping
Agentscomparison Dashboard
🚀
Display project metrics with real-time updates
Paused
Medical VLM with SAM-2 and CheXagent
🚀
A comprehensive medical imaging analysis tool
Paused
Medical Imaging Analysis
🏆
Paused
medicalaiapp
🚀
Paused
luminus
🚀
Paused
Debugcode
🔥