Introduction
Qwen2.5-32B-DialogueReason is a dialogue-based reasoning model built on Qwen2.5-32B-Base.
We train the model using Open-Reasoner-Zero data through rule-based reinforcement learning.  
๐ง Key Features
- Qwen2.5-32B-Base as the foundation.
 - Use Rule-Based RL to achieve dialogue reasoning.
 - With dynamic agent initialization to adapt to various scenarios.
 - With flexible environment configuration to set up task-specific contexts.
 - With multi-turn dialogue reasoning to incrementally solve problems.
 
Example
System:
The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: the play goes here\n if asked to write code, then code here surrounded by ```. Otherwise, answer here with \boxed{answer} emphasized.
User:
Give me a detailed explanation of PPO in RL
Assistant:
- Downloads last month
 - 69
 
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support
