Post
2388
Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.
TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚡, scale to multi-GPU/multi-node!
🧑🍳 recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training
TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚡, scale to multi-GPU/multi-node!
🧑🍳 recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training