Papers
arxiv:2412.14780

Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning

Published on Dec 19, 2024
Authors:
,
,
,
,

Abstract

A novel Shuffle-Aware Discriminator (SHAD) and Reasoning-highlighted Fine-Tuning (RFT) method improve LLM performance by differentiating and emphasizing reasoning tokens during fine-tuning.

AI-generated summary

When using agent-task datasets to enhance agent capabilities for Large Language Models (LLMs), current methodologies often treat all tokens within a sample equally. However, we argue that tokens serving different roles - specifically, reasoning tokens versus boilerplate tokens (e.g., those governing output format) - differ significantly in importance and learning complexity, necessitating their disentanglement and distinct treatment. To address this, we propose a novel Shuffle-Aware Discriminator (SHAD) for adaptive token discrimination. SHAD classifies tokens by exploiting predictability differences observed after shuffling input-output combinations across samples: boilerplate tokens, due to their repetitive nature among samples, maintain predictability, whereas reasoning tokens do not. Using SHAD, we propose the Reasoning-highlighted Fine-Tuning (RFT) method, which adaptively emphasizes reasoning tokens during fine-tuning, yielding notable performance gains over common Supervised Fine-Tuning (SFT).

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.14780 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.14780 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.