Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:
3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062) Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable
4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519 Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence
5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270 Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively
Coding is the field where AI is welcomed with open arms. Here’s a collection to help you take your AI-assisted coding workflows to the next level of convenience and efficiency:
1. Smol Developer → https://github.com/smol-ai/developer A lightweight AI “junior dev” that takes your product spec and automatically scaffolds or helps you build full codebases
2. Tabby → https://github.com/TabbyML/tabby A self-hosted AI coding assistant that runs locally as an alternative to GitHub Copilot. Easy to integrate, GPU-friendly, and doesn’t rely on the cloud
3. Beads (bd) Issue Tracker → https://github.com/steveyegge/beads Gives coding agents long-term memory, letting them organize, plan, and execute complex tasks reliably across sessions
4. MetaGPT → https://github.com/FoundationAgents/MetaGPT A multi-agent framework that imitates a software company team using LLMs. It assigns AI agents roles like PM, Architect, and Developer to produce user stories, designs, specs, and final code
5. Open Interpreter → https://github.com/openinterpreter/open-interpreter Gives you ChatGPT’s coding power with full local control – no limits, no sandbox – so you can automate, analyze, and create anything right from your desktop through a chat interface
6. OpenSpec → https://github.com/Fission-AI/OpenSpec A lightweight, spec-driven development tool that helps humans and AI agree on what to build before any code is written
7. PR-Agent → https://github.com/qodo-ai/pr-agent An AI code reviewer that automatically reviews, describes, and improves pull requests across GitHub, GitLab, and other platforms
8. BabyAGI → https://github.com/yoheinakajima/babyagi A self-building AI framework that gives agents the ability to write, manage, and refine their own functions, turning them from passive tools into active, self-building systems
If you want to understand the multifaceted AI landscape in 2025 and see where the field is heading – start with (or revisit) these legendary talks. They can help you capture what’s happening in AI from multiple angles:
1. Andrej Karpathy: Software Is Changing (Again) → https://www.youtube.com/watch?v=LCEmiRjPEtQ Unveils Software 3.0 – a paradigm where LLMs are the new computers, programmed with prompts instead of code. The key: developers must now master coding, training, and prompting as AI becomes the heart of software building
2. Richard Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience → https://www.youtube.com/watch?v=gEbbGyNkR2U Unveils the OaK (Options and Knowledge) architecture – a model-based RL framework for continual intelligence, where every component learns, meta-learns & builds hierarchical abstractions
3. GTC March 2025 Keynote with NVIDIA CEO Jensen Huang → https://www.youtube.com/watch?v=_waPvOwL9Z8 Dives into the accelerated computing and the importance of Physical AI. From the Blackwell GPU architecture & AI factories to breakthroughs in agentic AI & robotics, Jensen Huang explains how NVIDIA aims to power every layer of the AI ecosystem
4. Yann LeCun "Mathematical Obstacles on the Way to Human-Level AI" → https://www.youtube.com/watch?v=ETZfkkv6V7 Yann LeCun always argues we need a new path to machines that reason about the world – not LLMs or RL. So this lecture is about self-supervised systems with world models, planning, memory and energy-based learning
5. Andrew Ng: State of AI Agents → https://www.youtube.com/watch?v=4pYzYmSdSH4 Highlights one of the most pressing topics of 2025 – agents, explaining why most effective AI agents rely on simple, linear workflows built from modular “Lego-brick” tasks + what predicts AI startup success in the new agent era
Since Sora 2 is on fire these weeks, reminding us what high-quality video generation should look like, we decided you really need this list of video generation tools – great alternatives or complements to it.
1. Sora 2 → https://openai.com/sora/ It needs no introduction, but this OpenAI’s text-to-video model produces short, ultra-realistic clips across styles (cinematic, photorealistic, animated, etc.) with synced audio
2. Google Veo 3 (Gemini Video Generation) → https://aistudio.google.com/models/veo-3 Part of Gemini AI. Generates 8-second high-fidelity videos from text or images with native sound: background soundtracks and realistic voices with near-perfect lip sync
3. Runway (Gen-4 by Runway ML) → https://runwayml.com/ Text, image, or video-to-video generation with advanced editing like changing lighting, weather, camera angles or replacing objects. Popular in AI filmmaking
4. Pika Labs → https://pollo.ai/m/pika-ai Provides creative, often stylized short videos – from cinematic mini-scenes to cartoon-like animations. Ideal for social media clips and visual storytelling. Plus, you can add playful effects to manipulate objects in the generated videos
5. Luma’s Dream Machine → https://lumalabs.ai/dream-machine Powered by Luma AI’s latest Ray 3 model, it quickly visualizes story ideas, animated concept art, or abstract motion videos. It supports consistent custom characters and seamless looping
Reinforcement learning is having a moment - and not just this week. Some of its directions are already showing huge promise, while others are still early but exciting. Here’s a look at what’s happening right now in RL:
1. Reinforcement Pre-Training (RPT) → Reinforcement Pre-Training (2506.08007) Reframes next-token pretraining as RL with verifiable rewards, yielding scalable reasoning gains
2. Reinforcement Learning from Human Feedback (RLHF) → Deep reinforcement learning from human preferences (1706.03741) The top approach. It trains a model using human preference feedback, building a reward model and then optimizing the policy to generate outputs people prefer
The family of MCP (Model Context Protocol) servers keeps expanding to bridge agents, models, tools, web, data and apps. Here are 12 useful MCP servers that will help you create convenient agentic ecosystems:
1. Chrome DevTools MCP → https://github.com/ChromeDevTools/chrome-devtools-mcp Lets your coding agent (Gemini, Claude, Cursor, Copilot) control a live Chrome browser with full DevTools access for automation, debugging, and performance analysis
2. Windows-MCP → https://github.com/CursorTouch/Windows-MCP Provides interaction between agents and Windows, handling file navigation, app control, UI actions, QA testing
4. MetaMCP → https://github.com/metatool-ai/metamcp A proxy that aggregates multiple MCP servers into one, with middleware support. Works as a standard MCP server for any client
6. Playwright MCP → https://github.com/microsoft/playwright-mcp Lets LLMs interact with web pages via structured accessibility snapshots, no need for screenshots or visually-tuned models
8. Browserbase MCP Server → https://github.com/browserbase/mcp-server-browserbase Connects LLMs to external data and tools, adding cloud browser automation via Browserbase and Stagehand. It enables LLMs to browse, capture, extract, and act on web pages with precision
Low-Rank Adaptation (LoRA) is the go-to method for efficient model fine-tuning that adds small low-rank matrices instead of retraining full models. The field isn’t standing still – new LoRA variants push the limits of efficiency, generalization, and personalization. So we’re sharing 10 of the latest LoRA approaches you should know about:
4. aLoRA (Activated LoRA) → Activated LoRA: Fine-tuned LLMs for Intrinsics (2504.12397) Only applies LoRA after invocation, letting the model reuse the base model’s KV cache instead of recomputing the full turn’s KV cache. Efficient in multi-turn conversations
6 Recent & free sources to master Reinforcement Learning
Almost every week new research and resources on RL come out. Knowledge needs to be constantly refreshed and updated with the latest trends. So today, we’re sharing 6 free sources to help you stay on track with RL:
1. A Survey of Continual Reinforcement Learning → https://arxiv.org/abs/2506.21872 Covers continual RL (CRL): how agents can keep learning and adapt to new tasks without forgetting past ones. It analyses methods, benchmarks, evaluation metrics &challenges
3. Reinforcement Learning Specialization (Coursera, University of Alberta) → https://www.coursera.org/specializations/reinforcement-learning A 4-course series introducing foundational RL, implementing different algorithms, culminating in a capstone. It's a great structured path
5. A Survey of Reinforcement Learning for Software Engineering → https://arxiv.org/abs/2507.12483 Good if you're interested in RL-applied domains. Examines how RL is used in software engineering tasks: maintenance, development, evaluation. Covering 115 papers since DRL introduction, it summarizes trends, gaps & challenges
6. A Survey of Reinforcement Learning for LRMs → https://arxiv.org/abs/2509.08827 Tracks the way from LLMs to LRMs via RL. Covers reward design, policy optimization, use cases and future approaches like continual, memory, model-based RL and more
Models need feedback on what makes outputs “good” or “bad.” Policy optimization (PO) turns preferences and rewards into actual training signals. This field is evolving quickly, moving far beyond classics like PPO and GRPO. So here is our overview of 10 newest PO methods:
3. DCPO (Dynamic Clipping Policy Optimization) → DCPO: Dynamic Clipping Policy Optimization (2509.02333) Uses dynamic clipping, which adjusts probability limits per token for better token exploration, and smooth reward standardization to balance rewards over training steps and prevent wasted updates
4. ARPO (Agentic Reinforced Policy Optimization) → Agentic Reinforced Policy Optimization (2507.19849) Optimizes multi-turn LLM agents that use external tools. It uses an entropy-based adaptive rollout to explore post-tool use and an advantage attribution method to better assign credit across steps, leading to more efficient tool use with fewer resources
5. GRPO-RoC (Group Relative Policy Optimization with Resampling-on-Correct) → rStar2-Agent: Agentic Reasoning Technical Report (2508.20722) Oversamples rollouts, then resamples them to keep diverse mistakes and only the highest-quality correct answers. It reduces noises and ends up with stronger reasoning in a code environment
Everyone is buzzing around image generation this week, or more specifically, Google's Nano-Banana. So today we want to share a list of models that can be your great toolkit for image generation + editing + multi-turn refinement.
1. Gemini 2.5 Flash Image, or Nano-Banana → https://deepmind.google/models/gemini/image/ Google’s newest image model with conversational editing, character consistency, and multi-image fusion. Available in AI Studio and the Gemini API. Price: $2.50 per 1M tokens
2. FLUX (Black Forest Labs) → https://bfl.ai/ A family of models known for rich detail and, excellent prompt adherence, and fast iterative generation. Offered in several variants, from Pro to open-source, it's accessible via Hugging Face, Replicate, Azure AI Foundry, etc., and used as a base in many pipelines. Price: $0.025-0.08 per image
3. Midjourney v7 → https://www.midjourney.com/ Enhanced image fidelity, prompt comprehension, and anatomical coherence (hands, bodies, objects) + provides a smart lightbox editor. The Omni-reference tool improves character and object consistency in your images. It remains accessible via Discord with a supporting web interface. Price: $10-60/month
4. Stable Diffusion 3.5 (Stability AI) → https://stability.ai/stable-image Open-weights line with improved text rendering, photorealism, and prompt adherence compared to earlier versions. It introduces technical innovations through its MMDiT architecture. Price: $0.025-0.065 per image
5. OpenAI GPT-Image-1 →https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1 It's the same multimodal model that powers ChatGPT's image capabilities, offering high-fidelity image generation, precise edits, including inpainting, and accurate text rendering. Available via the Images API. Price: $40 per 1M tokens
Want to learn to build an AI Agent? I put together a cookbook for creating your own news research agent with OpenAI GPT-OSS:
- Searches headlines & specific sites - Pulls full articles when you need depth - Summarizes with clickable sources - Runs in a simple Gradio chat UI - No GPU, no local setup — just open-weight GPT-OSS models via Hugging Face
If you’ve been wanting to try agents but weren’t sure where to start, this is an end-to-end example you can fork, run, and adapt.
What can OpenAI’s new open models do with the news? I built a News Agent to find out.
It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.
Ask it things like: - "What are the top news stories today?" - "What's the latest on artificial intelligence?" - Follow-up questions on specific stories
Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models
So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!
OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.
For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.
It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.
Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions: 1️⃣ openai/gpt-oss-20b - 2.0M 2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K 3️⃣ openai/gpt-oss-120b - 430K 4️⃣ unsloth/gpt-oss-20b-GGUF - 380K 5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K
The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)
Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀
Sharing some free, useful resources for you. In this collection, we’ve gathered the most recent books to give you up-to-date information on key fundamental topics. Hope this helps you master AI and machine learning:
1. Machine Learning Systems by Vijay Janapa Reddi → https://www.mlsysbook.ai/ Provides a framework for building effective ML solutions, covering data engineering, optimization, hardware-aware training, inference acceleration, architecture choice, and other key principles
2. Generative Diffusion Modeling: A Practical Handbook by Zihan Ding, Chi Jin → https://arxiv.org/abs/2412.17162 Offers a unified view of diffusion models: probabilistic, score-based, consistency, rectified flow, pre/post-training. It aligns notations with code to close the “paper-to-code” gap.
3. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges → https://arxiv.org/abs/2104.13478 Explores unified geometric principles to analyze neural networks' architectures: CNNs, RNNs, GNNs, Transformers, and guide the design of the future ones
4. Mathematical Foundations of Geometric Deep Learning by Haitz Saez de Ocariz Borde and Michael Bronstein → https://arxiv.org/abs/2508.02723 Dives into the the key math concepts behind geometric Deep Learning: geometric and analytical structures, vector calculus, differential geometry, etc.
5. Interpretable Machine Learning by Christoph Molnar → https://github.com/christophM/interpretable-ml-book Practical guide to simple, transparent models (e.g., decision trees) and model-agnostic methods like LIME, Shapley values, permutation importance, and accumulated local effects.
6. Understanding Deep Learning by Simon J.D. Prince → https://udlbook.github.io/udlbook/ Explores core deep learning concenpts: models, training, evaluation, RL, architectures for images, text, and graphs, addressing open theoretical questions
New interactive viz from AI World showing OpenAI's new open model gpt-oss-120b breaking into the top 50 most liked models of all time on the Hub in under a day! ☄️☄️☄️
World models are one of the most challenging areas in AI, pushing the boundaries of reasoning, perception, and planning. They're gen AI systems that help models and agents learn internal representations of real-world environments.
Today, we invite you to take a look at 12 standout examples:
1. WorldVLA → WorldVLA: Towards Autoregressive Action World Model (2506.21539) This autoregressive world model integrates action prediction and visual world modeling in a single framework, allowing each to enhance the other. It introduces an attention masking strategy to reduce action prediction errors
2. SimuRA → https://arxiv.org/abs/2507.23773 A generalized world model that uses a language-based world model to simulate and plan actions before execution, enabling more general and flexible reasoning
3. PAN (Physical, Agentic, and Nested) world models → Critiques of World Models (2507.05169) Has a hybrid architecture that combines discrete concept-based reasoning (via LLMs) with continuous perceptual simulation (via diffusion models), enabling rich multi-level, multimodal understanding and prediction
5. WorldMem → WORLDMEM: Long-term Consistent World Simulation with Memory (2504.12369) Uses a memory bank with attention over time-stamped frames and states to maintain long-term and 3D spatial consistency in scene generation. So it reconstruct past scenes and simulate dynamic world changes across large temporal gaps
Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.
Here are 9 fresh policy optimization techniques worth knowing:
1. GSPO: Group Sequence Policy Optimization → Group Sequence Policy Optimization (2507.18071) Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.
3. HBPO: Hierarchical Budget Policy Optimization → Hierarchical Budget Policy Optimization for Adaptive Reasoning (2507.15844) This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.
5. RePO: Replay-Enhanced Policy Optimization → RePO: Replay-Enhanced Policy Optimization (2506.09340) Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt
This is what Hugging Face is all about. We want everyone, hobbyists, researchers and industry alike, to be able to contribute to AI because everyone is affected by it. Kudos to HF's @irenesolaiman for spreading the word!🔥🤗