LLM and LRM can be easily distracted by hidden instructions or irrelevant tasks. We curated SFT and DPO data that model can finetune to avoid distract
-
groupfairnessllm/tulu-3-preference-data-with-distraction
Viewer • Updated • 1.5k • 19 -
groupfairnessllm/tulu-3-sft-with-distraction
Viewer • Updated • 5.1k • 15 -
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
Paper • 2510.16259 • Published • 3 -
allenai/tulu-3-sft-personas-instruction-following
Viewer • Updated • 30k • 897 • 50