Update README.md
Browse files
README.md
CHANGED
|
@@ -10,18 +10,12 @@ datasets:
|
|
| 10 |
|
| 11 |
# Diff Interpretation Tuning: Weight Diffs and Adapters
|
| 12 |
This repository contains the weight diffs and DIT adapters used in the paper [Learning to Interpret Weight Differences in Language Models (Goel et al. 2025)](https://arxiv.org/abs/2510.05092).
|
| 13 |
-
This paper introduces *Diff Interpretation Tuning*, a method that trains a LoRA adapter than can be applied to a model to get it to describe its own finetuning induced modifications.
|
| 14 |
-
|
| 15 |
To play around with the weight diffs and DIT adapters from the paper, please check out our [Google Colab demo notebook](https://colab.research.google.com/drive/12YD_9GRT-y_hFOBqXzyI4eN_lJGKiXwN?usp=sharing#forceEdit=true&sandboxMode=true).
|
| 16 |
This notebook shows how to load the weight diffs and adapters from this repo.
|
| 17 |
|
| 18 |
The code used to train and evaluate our weight diffs and DIT adapters can be found at [github.com/Aviously/diff-interpretation-tuning](https://github.com/Aviously/diff-interpretation-tuning).
|
| 19 |
Some of the large data files used for training can be found at [hf.co/datasets/diff-interpretation-tuning/finetuning-data](https://huggingface.co/datasets/diff-interpretation-tuning/finetuning-data).
|
| 20 |
|
| 21 |
-
## Method overview
|
| 22 |
-
A diagrammatic overview of Diff Interpretation Tuning is shown below:
|
| 23 |
-
<img src="dit-diagram.png" alt="Diagram of Diff Interpretation Tuning" width="600"/>
|
| 24 |
-
|
| 25 |
## Repository structure
|
| 26 |
All weight diffs and DIT adapters in the repository live under a specific `<experiment>/<model>` folder (e.g. [hidden-topic/qwen3-4b](hidden-topic/qwen3-4b)).
|
| 27 |
Please consult [the paper](https://arxiv.org/abs/2510.05092) to understand what each experiment refers to.
|
|
|
|
| 10 |
|
| 11 |
# Diff Interpretation Tuning: Weight Diffs and Adapters
|
| 12 |
This repository contains the weight diffs and DIT adapters used in the paper [Learning to Interpret Weight Differences in Language Models (Goel et al. 2025)](https://arxiv.org/abs/2510.05092).
|
|
|
|
|
|
|
| 13 |
To play around with the weight diffs and DIT adapters from the paper, please check out our [Google Colab demo notebook](https://colab.research.google.com/drive/12YD_9GRT-y_hFOBqXzyI4eN_lJGKiXwN?usp=sharing#forceEdit=true&sandboxMode=true).
|
| 14 |
This notebook shows how to load the weight diffs and adapters from this repo.
|
| 15 |
|
| 16 |
The code used to train and evaluate our weight diffs and DIT adapters can be found at [github.com/Aviously/diff-interpretation-tuning](https://github.com/Aviously/diff-interpretation-tuning).
|
| 17 |
Some of the large data files used for training can be found at [hf.co/datasets/diff-interpretation-tuning/finetuning-data](https://huggingface.co/datasets/diff-interpretation-tuning/finetuning-data).
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
## Repository structure
|
| 20 |
All weight diffs and DIT adapters in the repository live under a specific `<experiment>/<model>` folder (e.g. [hidden-topic/qwen3-4b](hidden-topic/qwen3-4b)).
|
| 21 |
Please consult [the paper](https://arxiv.org/abs/2510.05092) to understand what each experiment refers to.
|