TARS-7B

Overview

TARS-7B is an open-source reasoning model trained for safety using TARS: Training Adaptive Reasoners for Safety introduced in the paper: Reasoning as an Adaptive Defense for Safety, to facilitate the research of reasoning models for LLM safety. This model is trained using a mixing ratio of =0.5\lambda = 0.5 between harmful and harmless prompts, starting from the base model Qwen2.5-7B-Instruct.

TARS is a simple but effective online reinforcement learning (RL) method that trains models to adaptively reason for low refusal and safe behavior, using three key ingredients:

馃攽 Key Ingredients

  • Ingredient 1: Lightweight supervised fine-tuning (SFT) for diverse generations
  • Ingredient 2: Mixing in harmless prompts during RL training
  • Ingredient 3: Decoupled reward model for better exploration

For full details, please check out our paper or blogpost.


馃摉 Citation

If you use TARS-7B in your work, please cite us:

@article{kim2025reasoning,
  title={Reasoning as an Adaptive Defense for Safety},
  author={Kim, Taeyoun and Tajwar, Fahim and Raghunathan, Aditi and Kumar, Aviral},
  journal={arXiv preprint arXiv:2507.00971},
  year={2025}
}
Downloads last month
29
Safetensors
Model size
8B params
Tensor type
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for CMU-AIRe/TARS-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(2799)
this model
Quantizations
2 models