Diff Interpretation Tuning
ttw commited on
Commit
5e689d6
·
verified ·
1 Parent(s): e7ac230

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -6
README.md CHANGED
@@ -10,18 +10,12 @@ datasets:
10
 
11
  # Diff Interpretation Tuning: Weight Diffs and Adapters
12
  This repository contains the weight diffs and DIT adapters used in the paper [Learning to Interpret Weight Differences in Language Models (Goel et al. 2025)](https://arxiv.org/abs/2510.05092).
13
- This paper introduces *Diff Interpretation Tuning*, a method that trains a LoRA adapter than can be applied to a model to get it to describe its own finetuning induced modifications.
14
-
15
  To play around with the weight diffs and DIT adapters from the paper, please check out our [Google Colab demo notebook](https://colab.research.google.com/drive/12YD_9GRT-y_hFOBqXzyI4eN_lJGKiXwN?usp=sharing#forceEdit=true&sandboxMode=true).
16
  This notebook shows how to load the weight diffs and adapters from this repo.
17
 
18
  The code used to train and evaluate our weight diffs and DIT adapters can be found at [github.com/Aviously/diff-interpretation-tuning](https://github.com/Aviously/diff-interpretation-tuning).
19
  Some of the large data files used for training can be found at [hf.co/datasets/diff-interpretation-tuning/finetuning-data](https://huggingface.co/datasets/diff-interpretation-tuning/finetuning-data).
20
 
21
- ## Method overview
22
- A diagrammatic overview of Diff Interpretation Tuning is shown below:
23
- <img src="dit-diagram.png" alt="Diagram of Diff Interpretation Tuning" width="600"/>
24
-
25
  ## Repository structure
26
  All weight diffs and DIT adapters in the repository live under a specific `<experiment>/<model>` folder (e.g. [hidden-topic/qwen3-4b](hidden-topic/qwen3-4b)).
27
  Please consult [the paper](https://arxiv.org/abs/2510.05092) to understand what each experiment refers to.
 
10
 
11
  # Diff Interpretation Tuning: Weight Diffs and Adapters
12
  This repository contains the weight diffs and DIT adapters used in the paper [Learning to Interpret Weight Differences in Language Models (Goel et al. 2025)](https://arxiv.org/abs/2510.05092).
 
 
13
  To play around with the weight diffs and DIT adapters from the paper, please check out our [Google Colab demo notebook](https://colab.research.google.com/drive/12YD_9GRT-y_hFOBqXzyI4eN_lJGKiXwN?usp=sharing#forceEdit=true&sandboxMode=true).
14
  This notebook shows how to load the weight diffs and adapters from this repo.
15
 
16
  The code used to train and evaluate our weight diffs and DIT adapters can be found at [github.com/Aviously/diff-interpretation-tuning](https://github.com/Aviously/diff-interpretation-tuning).
17
  Some of the large data files used for training can be found at [hf.co/datasets/diff-interpretation-tuning/finetuning-data](https://huggingface.co/datasets/diff-interpretation-tuning/finetuning-data).
18
 
 
 
 
 
19
  ## Repository structure
20
  All weight diffs and DIT adapters in the repository live under a specific `<experiment>/<model>` folder (e.g. [hidden-topic/qwen3-4b](hidden-topic/qwen3-4b)).
21
  Please consult [the paper](https://arxiv.org/abs/2510.05092) to understand what each experiment refers to.