projects-sandbox (Sandbox)

Kseniase

posted an update about 7 hours ago

Post

150

11 Fascinating new Policy Optimization techniques

Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:

1. BAlanced Policy Optimization (BAPO) → BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping (2510.18927)
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse

2. Training-Free GRPO → Training-Free Group Relative Policy Optimization (2510.08191)
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior

3. Asymmetric Importance Sampling Policy Optimization (ASPO) → ASPO: Asymmetric Importance Sampling Policy Optimization (2510.06062)
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable

4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence

5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively

6. Information Gain-based Policy Optimization (IGPO) → Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents (2510.14967)
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

1 reply

·

Kseniase

posted an update 7 days ago

Post

733

12 Awesome GitHub repos to upgrade your AI coding

Coding is the field where AI is welcomed with open arms. Here’s a collection to help you take your AI-assisted coding workflows to the next level of convenience and efficiency:

1. Smol Developer → https://github.com/smol-ai/developer
A lightweight AI “junior dev” that takes your product spec and automatically scaffolds or helps you build full codebases

2. Tabby → https://github.com/TabbyML/tabby
A self-hosted AI coding assistant that runs locally as an alternative to GitHub Copilot. Easy to integrate, GPU-friendly, and doesn’t rely on the cloud

3. Beads (bd) Issue Tracker → https://github.com/steveyegge/beads
Gives coding agents long-term memory, letting them organize, plan, and execute complex tasks reliably across sessions

4. MetaGPT → https://github.com/FoundationAgents/MetaGPT
A multi-agent framework that imitates a software company team using LLMs. It assigns AI agents roles like PM, Architect, and Developer to produce user stories, designs, specs, and final code

5. Open Interpreter → https://github.com/openinterpreter/open-interpreter
Gives you ChatGPT’s coding power with full local control – no limits, no sandbox – so you can automate, analyze, and create anything right from your desktop through a chat interface

6. OpenSpec → https://github.com/Fission-AI/OpenSpec
A lightweight, spec-driven development tool that helps humans and AI agree on what to build before any code is written

7. PR-Agent → https://github.com/qodo-ai/pr-agent
An AI code reviewer that automatically reviews, describes, and improves pull requests across GitHub, GitLab, and other platforms

8. BabyAGI → https://github.com/yoheinakajima/babyagi
A self-building AI framework that gives agents the ability to write, manage, and refine their own functions, turning them from passive tools into active, self-building systems

9 ...⬇️

Subscribe to the Turing Post: https://www.turingpost.com/subscribe – your shortcut to deep, clear AI analysis

2 replies

·

Kseniase

posted an update 14 days ago

Post

3952

5 Lectures and keynotes defining AI right now

If you want to understand the multifaceted AI landscape in 2025 and see where the field is heading – start with (or revisit) these legendary talks. They can help you capture what’s happening in AI from multiple angles:

1. Andrej Karpathy: Software Is Changing (Again) → https://www.youtube.com/watch?v=LCEmiRjPEtQ
Unveils Software 3.0 – a paradigm where LLMs are the new computers, programmed with prompts instead of code. The key: developers must now master coding, training, and prompting as AI becomes the heart of software building

2. Richard Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience → https://www.youtube.com/watch?v=gEbbGyNkR2U
Unveils the OaK (Options and Knowledge) architecture – a model-based RL framework for continual intelligence, where every component learns, meta-learns & builds hierarchical abstractions

3. GTC March 2025 Keynote with NVIDIA CEO Jensen Huang → https://www.youtube.com/watch?v=_waPvOwL9Z8
Dives into the accelerated computing and the importance of Physical AI. From the Blackwell GPU architecture & AI factories to breakthroughs in agentic AI & robotics, Jensen Huang explains how NVIDIA aims to power every layer of the AI ecosystem

4. Yann LeCun "Mathematical Obstacles on the Way to Human-Level AI" → https://www.youtube.com/watch?v=ETZfkkv6V7
Yann LeCun always argues we need a new path to machines that reason about the world – not LLMs or RL. So this lecture is about self-supervised systems with world models, planning, memory and energy-based learning

5. Andrew Ng: State of AI Agents → https://www.youtube.com/watch?v=4pYzYmSdSH4
Highlights one of the most pressing topics of 2025 – agents, explaining why most effective AI agents rely on simple, linear workflows built from modular “Lego-brick” tasks + what predicts AI startup success in the new agent era

Subscribe to the Turing Post: https://www.turingpost.com/subscribe –your shortcut to deep, clear AI analysis

Kseniase

posted an update 21 days ago

Post

3129

9 Powerful AI Video Generation Tools

Since Sora 2 is on fire these weeks, reminding us what high-quality video generation should look like, we decided you really need this list of video generation tools – great alternatives or complements to it.

1. Sora 2 → https://openai.com/sora/
It needs no introduction, but this OpenAI’s text-to-video model produces short, ultra-realistic clips across styles (cinematic, photorealistic, animated, etc.) with synced audio

2. Google Veo 3 (Gemini Video Generation) → https://aistudio.google.com/models/veo-3
Part of Gemini AI. Generates 8-second high-fidelity videos from text or images with native sound: background soundtracks and realistic voices with near-perfect lip sync

3. Runway (Gen-4 by Runway ML) → https://runwayml.com/
Text, image, or video-to-video generation with advanced editing like changing lighting, weather, camera angles or replacing objects. Popular in AI filmmaking

4. Pika Labs → https://pollo.ai/m/pika-ai
Provides creative, often stylized short videos – from cinematic mini-scenes to cartoon-like animations. Ideal for social media clips and visual storytelling. Plus, you can add playful effects to manipulate objects in the generated videos

5. Luma’s Dream Machine → https://lumalabs.ai/dream-machine
Powered by Luma AI’s latest Ray 3 model, it quickly visualizes story ideas, animated concept art, or abstract motion videos. It supports consistent custom characters and seamless looping

Read further below ⬇️
If you like it, also subscribe to the Turing Post https://www.turingpost.com/subscribe

1 reply

·

Kseniase

posted an update 28 days ago

Post

3784

8 Emerging trends in Reinforcement Learning

Reinforcement learning is having a moment - and not just this week. Some of its directions are already showing huge promise, while others are still early but exciting. Here’s a look at what’s happening right now in RL:

1. Reinforcement Pre-Training (RPT) → Reinforcement Pre-Training (2506.08007)
Reframes next-token pretraining as RL with verifiable rewards, yielding scalable reasoning gains

2. Reinforcement Learning from Human Feedback (RLHF) → Deep reinforcement learning from human preferences (1706.03741)
The top approach. It trains a model using human preference feedback, building a reward model and then optimizing the policy to generate outputs people prefer

3. Reinforcement Learning with Verifiable Rewards (RLVR) → Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs (2506.14245)
Moves from subjective (human-labeled) rewards to objective ones that can be automatically verified, like in math, code, or rubrics as reward, for example → Reinforcement Learning with Rubric Anchors (2508.12790), Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains (2507.17746)

4. Multi-objective RL → Pareto Multi-Objective Alignment for Language Models (2508.07768)
Trains LMs to balance multiple goals at once, like being helpful but also concise or creative, ensuring that improving one goal doesn’t ruin another

5. Parallel thinking RL → Parallel-R1: Towards Parallel Thinking via Reinforcement Learning (2509.07980)
Trains parallel chains of thought, boosting math accuracy and final ceilings. It first teaches the model “parallel thinking” skill on easier problems, then uses RL to refine it on harder ones

Read further below ⬇️
And if you like this, subscribe to the Turing post: https://www.turingpost.com/subscribe

Also, check out our recent guide about the past, present and future of RL: https://www.turingpost.com/p/rlguide

3 replies

·

Kseniase

posted an update about 1 month ago

Post

4553

12 Excellent MCP Servers

The family of MCP (Model Context Protocol) servers keeps expanding to bridge agents, models, tools, web, data and apps. Here are 12 useful MCP servers that will help you create convenient agentic ecosystems:

1. Chrome DevTools MCP → https://github.com/ChromeDevTools/chrome-devtools-mcp
Lets your coding agent (Gemini, Claude, Cursor, Copilot) control a live Chrome browser with full DevTools access for automation, debugging, and performance analysis

2. Windows-MCP → https://github.com/CursorTouch/Windows-MCP
Provides interaction between agents and Windows, handling file navigation, app control, UI actions, QA testing

3. MCPControl → https://github.com/claude-did-this/MCPControl
Windows control server for programmatic control of mouse, keyboard, window management, and screen capture

4. MetaMCP → https://github.com/metatool-ai/metamcp
A proxy that aggregates multiple MCP servers into one, with middleware support. Works as a standard MCP server for any client

5. MindsDB → https://github.com/mindsdb/mindsdb
Humans, models, agents and apps get accurate answers from large-scale data sources

6. Playwright MCP → https://github.com/microsoft/playwright-mcp
Lets LLMs interact with web pages via structured accessibility snapshots, no need for screenshots or visually-tuned models

7. MCP Access Point → https://github.com/sxhxliang/mcp-access-point
Bridges MCP clients with HTTP services, no server-side changes needed

8. Browserbase MCP Server → https://github.com/browserbase/mcp-server-browserbase
Connects LLMs to external data and tools, adding cloud browser automation via Browserbase and Stagehand. It enables LLMs to browse, capture, extract, and act on web pages with precision

9. Yutu → https://github.com/eat-pray-ai/yutu
Automates YouTube workflows, managing videos, playlists, channels, comments, captions, etc.

3 more below ↓
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

Kseniase

posted an update about 1 month ago

Post

6127

10 awesome advanced LoRA approaches

Low-Rank Adaptation (LoRA) is the go-to method for efficient model fine-tuning that adds small low-rank matrices instead of retraining full models. The field isn’t standing still – new LoRA variants push the limits of efficiency, generalization, and personalization. So we’re sharing 10 of the latest LoRA approaches you should know about:

1. Mixture-of-LoRA-experts → Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection (2509.13878)
Adds multiple low-rank adapters (LoRA) into a model’s layers, and a routing mechanism activates the most suitable ones for each input. This lets the model adapt better to new unseen conditions

2. Amortized Bayesian Meta-Learning for LoRA (ABMLL) → Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models (2508.14285)
Balances global and task-specific parameters within a Bayesian framework to improve uncertainty calibration and generalization to new tasks without high memory or compute costs

3. AutoLoRA → AutoLoRA: Automatic LoRA Retrieval and Fine-Grained Gated Fusion for Text-to-Image Generation (2508.02107)
Automatically retrieves and dynamically aggregates public LoRAs for stronger T2I generation

4. aLoRA (Activated LoRA) → Activated LoRA: Fine-tuned LLMs for Intrinsics (2504.12397)
Only applies LoRA after invocation, letting the model reuse the base model’s KV cache instead of recomputing the full turn’s KV cache. Efficient in multi-turn conversations

5. LiLoRA (LoRA in LoRA) → LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning (2508.06202)
Shares the LoRA matrix A across tasks and additionally low-rank-decomposes matrix B to cut parameters in continual vision-text MLLMs

6. Sensitivity-LoRA → Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models (2509.09119)
Dynamically assigns ranks to weight matrices based on their sensitivity, measured using second-order derivatives

Read further below ↓
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

3 replies

·

Kseniase

posted an update about 2 months ago

Post

6331

6 Recent & free sources to master Reinforcement Learning

Almost every week new research and resources on RL come out. Knowledge needs to be constantly refreshed and updated with the latest trends. So today, we’re sharing 6 free sources to help you stay on track with RL:

1. A Survey of Continual Reinforcement Learning → https://arxiv.org/abs/2506.21872
Covers continual RL (CRL): how agents can keep learning and adapt to new tasks without forgetting past ones. It analyses methods, benchmarks, evaluation metrics &challenges

2. The Deep Reinforcement Learning course by Hugging Face → https://huggingface.co/learn/deep-rl-course/unit0/introduction
This is a popular free course, regularly updated. Includes community interaction, exercises, leaderboards, etc.

3. Reinforcement Learning Specialization (Coursera, University of Alberta) → https://www.coursera.org/specializations/reinforcement-learning
A 4-course series introducing foundational RL, implementing different algorithms, culminating in a capstone. It's a great structured path

4. A Technical Survey of Reinforcement Learning Techniques for LLMs → A Technical Survey of Reinforcement Learning Techniques for Large Language Models (2507.04136)
Looks at how RL is being used for/with LLMs for alignment, reasoning, preference signals, etc. Covers methods like RLHF, RLAIF, DPO, PPO, GRPO & applications from code gen to tool use

5. A Survey of Reinforcement Learning for Software Engineering → https://arxiv.org/abs/2507.12483
Good if you're interested in RL-applied domains. Examines how RL is used in software engineering tasks: maintenance, development, evaluation. Covering 115 papers since DRL introduction, it summarizes trends, gaps & challenges

6. A Survey of Reinforcement Learning for LRMs → https://arxiv.org/abs/2509.08827
Tracks the way from LLMs to LRMs via RL. Covers reward design, policy optimization, use cases and future approaches like continual, memory, model-based RL and more

If you liked this, subscribe to The Turing Post https://www.turingpost.com/subscribe

1 reply

·

Kseniase

posted an update about 2 months ago

Post

7018

10 Latest Preference Optimization Techniques

Models need feedback on what makes outputs “good” or “bad.” Policy optimization (PO) turns preferences and rewards into actual training signals. This field is evolving quickly, moving far beyond classics like PPO and GRPO. So here is our overview of 10 newest PO methods:

1. Pref-GRPO → Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning (2508.20751)
Stabilizes text-to-image reinforcement learning (RL) with pairwise preference rewards and a unified UNIGENBENCH benchmark

2. PVPO (Policy with Value Preference Optimization) → PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning (2508.21104)
This critic-free RL method uses a pre-trained model as a reference anchor to reduce bias and guide learning, selecting high-value examples through data pre-sampling

3. DCPO (Dynamic Clipping Policy Optimization) → DCPO: Dynamic Clipping Policy Optimization (2509.02333)
Uses dynamic clipping, which adjusts probability limits per token for better token exploration, and smooth reward standardization to balance rewards over training steps and prevent wasted updates

4. ARPO (Agentic Reinforced Policy Optimization) → Agentic Reinforced Policy Optimization (2507.19849)
Optimizes multi-turn LLM agents that use external tools. It uses an entropy-based adaptive rollout to explore post-tool use and an advantage attribution method to better assign credit across steps, leading to more efficient tool use with fewer resources

5. GRPO-RoC (Group Relative Policy Optimization with Resampling-on-Correct) → rStar2-Agent: Agentic Reasoning Technical Report (2508.20722)
Oversamples rollouts, then resamples them to keep diverse mistakes and only the highest-quality correct answers. It reduces noises and ends up with stronger reasoning in a code environment

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

1 reply

·

Kseniase

posted an update 2 months ago

Post

444

11 Powerful Image Models

Everyone is buzzing around image generation this week, or more specifically, Google's Nano-Banana. So today we want to share a list of models that can be your great toolkit for image generation + editing + multi-turn refinement.

1. Gemini 2.5 Flash Image, or Nano-Banana →
https://deepmind.google/models/gemini/image/
Google’s newest image model with conversational editing, character consistency, and multi-image fusion. Available in AI Studio and the Gemini API. Price: $2.50 per 1M tokens

2. FLUX (Black Forest Labs) → https://bfl.ai/
A family of models known for rich detail and, excellent prompt adherence, and fast iterative generation. Offered in several variants, from Pro to open-source, it's accessible via Hugging Face, Replicate, Azure AI Foundry, etc., and used as a base in many pipelines. Price: $0.025-0.08 per image

3. Midjourney v7 → https://www.midjourney.com/
Enhanced image fidelity, prompt comprehension, and anatomical coherence (hands, bodies, objects) + provides a smart lightbox editor. The Omni-reference tool improves character and object consistency in your images. It remains accessible via Discord with a supporting web interface. Price: $10-60/month

4. Stable Diffusion 3.5 (Stability AI) → https://stability.ai/stable-image
Open-weights line with improved text rendering, photorealism, and
prompt adherence compared to earlier versions. It introduces technical innovations through its MMDiT architecture. Price: $0.025-0.065 per image

5. OpenAI GPT-Image-1 →https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1
It's the same multimodal model that powers ChatGPT's image capabilities, offering high-fidelity image generation, precise edits, including inpainting, and accurate text rendering. Available via the Images API. Price: $40 per 1M tokens

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

1 reply

·

fdaudens

posted an update 3 months ago

Post

5962

Want to learn to build an AI Agent? I put together a cookbook for creating your own news research agent with OpenAI GPT-OSS:

- Searches headlines & specific sites
- Pulls full articles when you need depth
- Summarizes with clickable sources
- Runs in a simple Gradio chat UI
- No GPU, no local setup — just open-weight GPT-OSS models via Hugging Face

If you’ve been wanting to try agents but weren’t sure where to start, this is an end-to-end example you can fork, run, and adapt.

Full guide + code https://huggingface.co/blog/fdaudens/openai-gpt-oss-agent-inference-providers

3 replies

·

fdaudens

posted an update 3 months ago

Post

548

What can OpenAI’s new open models do with the news? I built a News Agent to find out.

It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.

Ask it things like:
- "What are the top news stories today?"
- "What's the latest on artificial intelligence?"
- Follow-up questions on specific stories

Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models

So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!

https://huggingface.co/spaces/fdaudens/gpt-oss-news-agent

2 replies

·

BrigitteTousi

posted an update 3 months ago

Post

753

On Wednesday, August 13 at 11am EDT, join @clem for a no bullshit AMA on Discord. Prep all your HF questions and meet us there! 🤗☄️⚡️

https://discord.com/invite/6r5TEXyk?event=1404451892179763311

fdaudens

posted an update 3 months ago

Post

3414

OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.

For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.

It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.

Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K

The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)

Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀

1 reply

·

Kseniase

posted an update 3 months ago

Post

3641

6 Must-read books about AI and Machine Learning:

Sharing some free, useful resources for you. In this collection, we’ve gathered the most recent books to give you up-to-date information on key fundamental topics. Hope this helps you master AI and machine learning:

1. Machine Learning Systems by Vijay Janapa Reddi → https://www.mlsysbook.ai/
Provides a framework for building effective ML solutions, covering data engineering, optimization, hardware-aware training, inference acceleration, architecture choice, and other key principles

2. Generative Diffusion Modeling: A Practical Handbook by Zihan Ding, Chi Jin → https://arxiv.org/abs/2412.17162
Offers a unified view of diffusion models: probabilistic, score-based, consistency, rectified flow, pre/post-training. It aligns notations with code to close the “paper-to-code” gap.

3. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges → https://arxiv.org/abs/2104.13478
Explores unified geometric principles to analyze neural networks' architectures: CNNs, RNNs, GNNs, Transformers, and guide the design of the future ones

4. Mathematical Foundations of Geometric Deep Learning by Haitz Saez de Ocariz Borde and Michael Bronstein → https://arxiv.org/abs/2508.02723
Dives into the the key math concepts behind geometric Deep Learning: geometric and analytical structures, vector calculus, differential geometry, etc.

5. Interpretable Machine Learning by Christoph Molnar → https://github.com/christophM/interpretable-ml-book
Practical guide to simple, transparent models (e.g., decision trees) and model-agnostic methods like LIME, Shapley values, permutation importance, and accumulated local effects.

6. Understanding Deep Learning by Simon J.D. Prince → https://udlbook.github.io/udlbook/
Explores core deep learning concenpts: models, training, evaluation, RL, architectures for images, text, and graphs, addressing open theoretical questions

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

·

BrigitteTousi

posted an update 3 months ago

Post

603

New interactive viz from AI World showing OpenAI's new open model gpt-oss-120b breaking into the top 50 most liked models of all time on the Hub in under a day! ☄️☄️☄️

fdaudens

posted an update 3 months ago

Post

2665

Well, it took just 2 hours for openai/gpt-oss-120b to hit #1 on Hugging Face. Don’t remember seeing anything rise that fast!

1 reply

·

Kseniase

posted an update 3 months ago

Post

3483

12 Powerful World Models

World models are one of the most challenging areas in AI, pushing the boundaries of reasoning, perception, and planning. They're gen AI systems that help models and agents learn internal representations of real-world environments.

Today, we invite you to take a look at 12 standout examples:

1. WorldVLA → WorldVLA: Towards Autoregressive Action World Model (2506.21539)
This autoregressive world model integrates action prediction and visual world modeling in a single framework, allowing each to enhance the other. It introduces an attention masking strategy to reduce action prediction errors

2. SimuRA → https://arxiv.org/abs/2507.23773
A generalized world model that uses a language-based world model to simulate and plan actions before execution, enabling more general and flexible reasoning

3. PAN (Physical, Agentic, and Nested) world models → Critiques of World Models (2507.05169)
Has a hybrid architecture that combines discrete concept-based reasoning (via LLMs) with continuous perceptual simulation (via diffusion models), enabling rich multi-level, multimodal understanding and prediction

4. MineWorld by Microsoft Research → MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft (2504.08388)
Enables real-time, interactive world modeling in Minecraft by combining visual and action tokenization within an autoregressive Transformer. It uses parallel decoding for fast scene generation (4–7 FPS)

5. WorldMem → WORLDMEM: Long-term Consistent World Simulation with Memory (2504.12369)
Uses a memory bank with attention over time-stamped frames and states to maintain long-term and 3D spatial consistency in scene generation. So it reconstruct past scenes and simulate dynamic world changes across large temporal gaps

Read further below ⬇️

If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

Plus explore this article for a comprehensive overview of the history and current evolution of world models: https://www.turingpost.com/p/topic-35-what-are-world-models

1 reply

·

Kseniase

posted an update 3 months ago

Post

5076

9 new policy optimization techniques

Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.

Here are 9 fresh policy optimization techniques worth knowing:

1. GSPO: Group Sequence Policy Optimization → Group Sequence Policy Optimization (2507.18071)
Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.

2. LAPO: Length-Adaptive Policy Optimization → LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization (2507.15758)
A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning.

3. HBPO: Hierarchical Budget Policy Optimization → Hierarchical Budget Policy Optimization for Adaptive Reasoning (2507.15844)
This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.

4. SOPHIA: Semi-off-policy reinforcement learning → Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning (2507.16814)
Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps.

5. RePO: Replay-Enhanced Policy Optimization → RePO: Replay-Enhanced Policy Optimization (2506.09340)
Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt

Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

BrigitteTousi

posted an update 3 months ago

Post

642

This is what Hugging Face is all about. We want everyone, hobbyists, researchers and industry alike, to be able to contribute to AI because everyone is affected by it. Kudos to HF's @irenesolaiman for spreading the word!🔥🤗

AI & ML interests

Team members 5

projects-sandbox's activity