zhuangxialie commited on
Commit
78c7f5f
·
verified ·
1 Parent(s): 74fb84a

Model save

Browse files
Files changed (5) hide show
  1. README.md +57 -0
  2. all_results.json +8 -0
  3. generation_config.json +14 -0
  4. train_results.json +8 -0
  5. trainer_state.json +3018 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ model_name: Qwen-code-7B-SFT-100k-v2-cots
4
+ tags:
5
+ - generated_from_trainer
6
+ - trl
7
+ - sft
8
+ licence: license
9
+ ---
10
+
11
+ # Model Card for Qwen-code-7B-SFT-100k-v2-cots
12
+
13
+ This model is a fine-tuned version of [None](https://huggingface.co/None).
14
+ It has been trained using [TRL](https://github.com/huggingface/trl).
15
+
16
+ ## Quick start
17
+
18
+ ```python
19
+ from transformers import pipeline
20
+
21
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
22
+ generator = pipeline("text-generation", model="ZhuangXialie/Qwen-code-7B-SFT-100k-v2-cots", device="cuda")
23
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
24
+ print(output["generated_text"])
25
+ ```
26
+
27
+ ## Training procedure
28
+
29
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/dyx_team/huggingface/runs/ofegsz5g)
30
+
31
+
32
+ This model was trained with SFT.
33
+
34
+ ### Framework versions
35
+
36
+ - TRL: 0.16.0.dev0
37
+ - Transformers: 4.49.0
38
+ - Pytorch: 2.6.0
39
+ - Datasets: 3.5.1
40
+ - Tokenizers: 0.21.1
41
+
42
+ ## Citations
43
+
44
+
45
+
46
+ Cite TRL as:
47
+
48
+ ```bibtex
49
+ @misc{vonwerra2022trl,
50
+ title = {{TRL: Transformer Reinforcement Learning}},
51
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
52
+ year = 2020,
53
+ journal = {GitHub repository},
54
+ publisher = {GitHub},
55
+ howpublished = {\url{https://github.com/huggingface/trl}}
56
+ }
57
+ ```
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_flos": 966947082862592.0,
3
+ "train_loss": 0.34282420668550717,
4
+ "train_runtime": 10626.5662,
5
+ "train_samples": 98973,
6
+ "train_samples_per_second": 2.802,
7
+ "train_steps_per_second": 0.175
8
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.1,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.49.0"
14
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_flos": 966947082862592.0,
3
+ "train_loss": 0.34282420668550717,
4
+ "train_runtime": 10626.5662,
5
+ "train_samples": 98973,
6
+ "train_samples_per_second": 2.802,
7
+ "train_steps_per_second": 0.175
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,3018 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.992481203007519,
5
+ "eval_steps": 500,
6
+ "global_step": 1860,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.010741138560687433,
13
+ "grad_norm": 2.6824158480436027,
14
+ "learning_rate": 1.3440860215053765e-06,
15
+ "loss": 0.8294,
16
+ "mean_token_accuracy": 0.8010891914367676,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 0.021482277121374866,
21
+ "grad_norm": 1.0834186154450132,
22
+ "learning_rate": 2.688172043010753e-06,
23
+ "loss": 0.7976,
24
+ "mean_token_accuracy": 0.8042729198932648,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 0.0322234156820623,
29
+ "grad_norm": 0.9912101287518572,
30
+ "learning_rate": 4.032258064516129e-06,
31
+ "loss": 0.7318,
32
+ "mean_token_accuracy": 0.8116350173950195,
33
+ "step": 15
34
+ },
35
+ {
36
+ "epoch": 0.04296455424274973,
37
+ "grad_norm": 0.6161547535500218,
38
+ "learning_rate": 5.376344086021506e-06,
39
+ "loss": 0.6796,
40
+ "mean_token_accuracy": 0.8214974880218506,
41
+ "step": 20
42
+ },
43
+ {
44
+ "epoch": 0.05370569280343716,
45
+ "grad_norm": 0.4711983922639431,
46
+ "learning_rate": 6.720430107526882e-06,
47
+ "loss": 0.6403,
48
+ "mean_token_accuracy": 0.8289329469203949,
49
+ "step": 25
50
+ },
51
+ {
52
+ "epoch": 0.0644468313641246,
53
+ "grad_norm": 0.3514340436445771,
54
+ "learning_rate": 8.064516129032258e-06,
55
+ "loss": 0.6101,
56
+ "mean_token_accuracy": 0.8344561219215393,
57
+ "step": 30
58
+ },
59
+ {
60
+ "epoch": 0.07518796992481203,
61
+ "grad_norm": 0.2900558861500113,
62
+ "learning_rate": 9.408602150537635e-06,
63
+ "loss": 0.5849,
64
+ "mean_token_accuracy": 0.8396502792835235,
65
+ "step": 35
66
+ },
67
+ {
68
+ "epoch": 0.08592910848549946,
69
+ "grad_norm": 0.2722947727971047,
70
+ "learning_rate": 1.0752688172043012e-05,
71
+ "loss": 0.5701,
72
+ "mean_token_accuracy": 0.8420377433300018,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 0.0966702470461869,
77
+ "grad_norm": 0.25248070544882645,
78
+ "learning_rate": 1.2096774193548388e-05,
79
+ "loss": 0.561,
80
+ "mean_token_accuracy": 0.8443691551685333,
81
+ "step": 45
82
+ },
83
+ {
84
+ "epoch": 0.10741138560687433,
85
+ "grad_norm": 0.2504332745775819,
86
+ "learning_rate": 1.3440860215053763e-05,
87
+ "loss": 0.5601,
88
+ "mean_token_accuracy": 0.8441641569137573,
89
+ "step": 50
90
+ },
91
+ {
92
+ "epoch": 0.11815252416756176,
93
+ "grad_norm": 0.21685484456472007,
94
+ "learning_rate": 1.4784946236559142e-05,
95
+ "loss": 0.5455,
96
+ "mean_token_accuracy": 0.8471231937408448,
97
+ "step": 55
98
+ },
99
+ {
100
+ "epoch": 0.1288936627282492,
101
+ "grad_norm": 0.23513981149675298,
102
+ "learning_rate": 1.6129032258064517e-05,
103
+ "loss": 0.5486,
104
+ "mean_token_accuracy": 0.8462919056415558,
105
+ "step": 60
106
+ },
107
+ {
108
+ "epoch": 0.13963480128893663,
109
+ "grad_norm": 0.21971215723488632,
110
+ "learning_rate": 1.7473118279569895e-05,
111
+ "loss": 0.5372,
112
+ "mean_token_accuracy": 0.8488749146461487,
113
+ "step": 65
114
+ },
115
+ {
116
+ "epoch": 0.15037593984962405,
117
+ "grad_norm": 0.22582010917696982,
118
+ "learning_rate": 1.881720430107527e-05,
119
+ "loss": 0.5341,
120
+ "mean_token_accuracy": 0.8489724159240722,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.1611170784103115,
125
+ "grad_norm": 0.2505238494065726,
126
+ "learning_rate": 2.0161290322580645e-05,
127
+ "loss": 0.5288,
128
+ "mean_token_accuracy": 0.8500843226909638,
129
+ "step": 75
130
+ },
131
+ {
132
+ "epoch": 0.17185821697099893,
133
+ "grad_norm": 0.2485546682065235,
134
+ "learning_rate": 2.1505376344086024e-05,
135
+ "loss": 0.5265,
136
+ "mean_token_accuracy": 0.8504622042179107,
137
+ "step": 80
138
+ },
139
+ {
140
+ "epoch": 0.18259935553168635,
141
+ "grad_norm": 0.25134861732181085,
142
+ "learning_rate": 2.28494623655914e-05,
143
+ "loss": 0.5245,
144
+ "mean_token_accuracy": 0.8512703776359558,
145
+ "step": 85
146
+ },
147
+ {
148
+ "epoch": 0.1933404940923738,
149
+ "grad_norm": 0.2607421207193637,
150
+ "learning_rate": 2.4193548387096777e-05,
151
+ "loss": 0.5225,
152
+ "mean_token_accuracy": 0.8512581944465637,
153
+ "step": 90
154
+ },
155
+ {
156
+ "epoch": 0.20408163265306123,
157
+ "grad_norm": 0.2571937237076843,
158
+ "learning_rate": 2.5537634408602152e-05,
159
+ "loss": 0.5169,
160
+ "mean_token_accuracy": 0.8526618123054505,
161
+ "step": 95
162
+ },
163
+ {
164
+ "epoch": 0.21482277121374865,
165
+ "grad_norm": 0.2559454741361629,
166
+ "learning_rate": 2.6881720430107527e-05,
167
+ "loss": 0.5087,
168
+ "mean_token_accuracy": 0.8544329702854156,
169
+ "step": 100
170
+ },
171
+ {
172
+ "epoch": 0.22556390977443608,
173
+ "grad_norm": 0.25657620243689094,
174
+ "learning_rate": 2.822580645161291e-05,
175
+ "loss": 0.5069,
176
+ "mean_token_accuracy": 0.8545464932918548,
177
+ "step": 105
178
+ },
179
+ {
180
+ "epoch": 0.23630504833512353,
181
+ "grad_norm": 0.3084326429216429,
182
+ "learning_rate": 2.9569892473118284e-05,
183
+ "loss": 0.5109,
184
+ "mean_token_accuracy": 0.8538104116916656,
185
+ "step": 110
186
+ },
187
+ {
188
+ "epoch": 0.24704618689581095,
189
+ "grad_norm": 0.2964885334930525,
190
+ "learning_rate": 3.091397849462366e-05,
191
+ "loss": 0.5026,
192
+ "mean_token_accuracy": 0.8555706679821015,
193
+ "step": 115
194
+ },
195
+ {
196
+ "epoch": 0.2577873254564984,
197
+ "grad_norm": 0.2640055744535602,
198
+ "learning_rate": 3.2258064516129034e-05,
199
+ "loss": 0.4952,
200
+ "mean_token_accuracy": 0.8576966226100922,
201
+ "step": 120
202
+ },
203
+ {
204
+ "epoch": 0.26852846401718583,
205
+ "grad_norm": 0.28061492437295604,
206
+ "learning_rate": 3.360215053763441e-05,
207
+ "loss": 0.4983,
208
+ "mean_token_accuracy": 0.8568866074085235,
209
+ "step": 125
210
+ },
211
+ {
212
+ "epoch": 0.27926960257787325,
213
+ "grad_norm": 0.3222080670739919,
214
+ "learning_rate": 3.494623655913979e-05,
215
+ "loss": 0.4919,
216
+ "mean_token_accuracy": 0.8582496762275695,
217
+ "step": 130
218
+ },
219
+ {
220
+ "epoch": 0.2900107411385607,
221
+ "grad_norm": 0.3018861867966521,
222
+ "learning_rate": 3.6290322580645165e-05,
223
+ "loss": 0.4921,
224
+ "mean_token_accuracy": 0.858267605304718,
225
+ "step": 135
226
+ },
227
+ {
228
+ "epoch": 0.3007518796992481,
229
+ "grad_norm": 0.27298497353963225,
230
+ "learning_rate": 3.763440860215054e-05,
231
+ "loss": 0.4897,
232
+ "mean_token_accuracy": 0.858799421787262,
233
+ "step": 140
234
+ },
235
+ {
236
+ "epoch": 0.31149301825993553,
237
+ "grad_norm": 0.29189277480966186,
238
+ "learning_rate": 3.8978494623655915e-05,
239
+ "loss": 0.4831,
240
+ "mean_token_accuracy": 0.8604558348655701,
241
+ "step": 145
242
+ },
243
+ {
244
+ "epoch": 0.322234156820623,
245
+ "grad_norm": 0.28012276855965057,
246
+ "learning_rate": 4.032258064516129e-05,
247
+ "loss": 0.4834,
248
+ "mean_token_accuracy": 0.8607946753501892,
249
+ "step": 150
250
+ },
251
+ {
252
+ "epoch": 0.33297529538131043,
253
+ "grad_norm": 0.2822021421564993,
254
+ "learning_rate": 4.166666666666667e-05,
255
+ "loss": 0.4822,
256
+ "mean_token_accuracy": 0.8607180714607239,
257
+ "step": 155
258
+ },
259
+ {
260
+ "epoch": 0.34371643394199786,
261
+ "grad_norm": 0.2669043120039336,
262
+ "learning_rate": 4.301075268817205e-05,
263
+ "loss": 0.4709,
264
+ "mean_token_accuracy": 0.8635617375373841,
265
+ "step": 160
266
+ },
267
+ {
268
+ "epoch": 0.3544575725026853,
269
+ "grad_norm": 0.26430063130872034,
270
+ "learning_rate": 4.435483870967742e-05,
271
+ "loss": 0.4759,
272
+ "mean_token_accuracy": 0.8624868154525757,
273
+ "step": 165
274
+ },
275
+ {
276
+ "epoch": 0.3651987110633727,
277
+ "grad_norm": 0.2768300795347462,
278
+ "learning_rate": 4.56989247311828e-05,
279
+ "loss": 0.4698,
280
+ "mean_token_accuracy": 0.863774424791336,
281
+ "step": 170
282
+ },
283
+ {
284
+ "epoch": 0.37593984962406013,
285
+ "grad_norm": 0.27300710251352905,
286
+ "learning_rate": 4.704301075268818e-05,
287
+ "loss": 0.4688,
288
+ "mean_token_accuracy": 0.8640853643417359,
289
+ "step": 175
290
+ },
291
+ {
292
+ "epoch": 0.3866809881847476,
293
+ "grad_norm": 0.28130219154214986,
294
+ "learning_rate": 4.8387096774193554e-05,
295
+ "loss": 0.4616,
296
+ "mean_token_accuracy": 0.8659515857696534,
297
+ "step": 180
298
+ },
299
+ {
300
+ "epoch": 0.39742212674543503,
301
+ "grad_norm": 0.28040903261236555,
302
+ "learning_rate": 4.973118279569893e-05,
303
+ "loss": 0.4652,
304
+ "mean_token_accuracy": 0.8656746566295623,
305
+ "step": 185
306
+ },
307
+ {
308
+ "epoch": 0.40816326530612246,
309
+ "grad_norm": 0.32637783754316196,
310
+ "learning_rate": 4.999936604372673e-05,
311
+ "loss": 0.4584,
312
+ "mean_token_accuracy": 0.8662971913814544,
313
+ "step": 190
314
+ },
315
+ {
316
+ "epoch": 0.4189044038668099,
317
+ "grad_norm": 0.3235247316768069,
318
+ "learning_rate": 4.9996790657593474e-05,
319
+ "loss": 0.4652,
320
+ "mean_token_accuracy": 0.865262484550476,
321
+ "step": 195
322
+ },
323
+ {
324
+ "epoch": 0.4296455424274973,
325
+ "grad_norm": 0.2756975255703871,
326
+ "learning_rate": 4.999223444591954e-05,
327
+ "loss": 0.4533,
328
+ "mean_token_accuracy": 0.8687061607837677,
329
+ "step": 200
330
+ },
331
+ {
332
+ "epoch": 0.44038668098818473,
333
+ "grad_norm": 0.26466440633632593,
334
+ "learning_rate": 4.998569780987594e-05,
335
+ "loss": 0.4521,
336
+ "mean_token_accuracy": 0.8684524893760681,
337
+ "step": 205
338
+ },
339
+ {
340
+ "epoch": 0.45112781954887216,
341
+ "grad_norm": 0.25138863961089425,
342
+ "learning_rate": 4.997718132500857e-05,
343
+ "loss": 0.4456,
344
+ "mean_token_accuracy": 0.8701819539070129,
345
+ "step": 210
346
+ },
347
+ {
348
+ "epoch": 0.46186895810955964,
349
+ "grad_norm": 0.3025611470224811,
350
+ "learning_rate": 4.9966685741187544e-05,
351
+ "loss": 0.447,
352
+ "mean_token_accuracy": 0.8699068784713745,
353
+ "step": 215
354
+ },
355
+ {
356
+ "epoch": 0.47261009667024706,
357
+ "grad_norm": 0.24615962175136596,
358
+ "learning_rate": 4.995421198254114e-05,
359
+ "loss": 0.4445,
360
+ "mean_token_accuracy": 0.8706246316432953,
361
+ "step": 220
362
+ },
363
+ {
364
+ "epoch": 0.4833512352309345,
365
+ "grad_norm": 0.23780094613136366,
366
+ "learning_rate": 4.9939761147374455e-05,
367
+ "loss": 0.444,
368
+ "mean_token_accuracy": 0.8709352612495422,
369
+ "step": 225
370
+ },
371
+ {
372
+ "epoch": 0.4940923737916219,
373
+ "grad_norm": 0.26418243428675386,
374
+ "learning_rate": 4.992333450807268e-05,
375
+ "loss": 0.4428,
376
+ "mean_token_accuracy": 0.8712534010410309,
377
+ "step": 230
378
+ },
379
+ {
380
+ "epoch": 0.5048335123523093,
381
+ "grad_norm": 0.2452687330812135,
382
+ "learning_rate": 4.990493351098908e-05,
383
+ "loss": 0.4375,
384
+ "mean_token_accuracy": 0.8728318750858307,
385
+ "step": 235
386
+ },
387
+ {
388
+ "epoch": 0.5155746509129968,
389
+ "grad_norm": 0.2688160648750715,
390
+ "learning_rate": 4.9884559776317644e-05,
391
+ "loss": 0.4353,
392
+ "mean_token_accuracy": 0.8730437099933624,
393
+ "step": 240
394
+ },
395
+ {
396
+ "epoch": 0.5263157894736842,
397
+ "grad_norm": 0.25960118051112435,
398
+ "learning_rate": 4.986221509795043e-05,
399
+ "loss": 0.4317,
400
+ "mean_token_accuracy": 0.8739780306816101,
401
+ "step": 245
402
+ },
403
+ {
404
+ "epoch": 0.5370569280343717,
405
+ "grad_norm": 0.23341024093650933,
406
+ "learning_rate": 4.98379014433196e-05,
407
+ "loss": 0.4352,
408
+ "mean_token_accuracy": 0.8733076274394989,
409
+ "step": 250
410
+ },
411
+ {
412
+ "epoch": 0.547798066595059,
413
+ "grad_norm": 0.25741008352215955,
414
+ "learning_rate": 4.981162095322421e-05,
415
+ "loss": 0.4324,
416
+ "mean_token_accuracy": 0.8738310694694519,
417
+ "step": 255
418
+ },
419
+ {
420
+ "epoch": 0.5585392051557465,
421
+ "grad_norm": 0.23274342659284017,
422
+ "learning_rate": 4.9783375941641696e-05,
423
+ "loss": 0.4321,
424
+ "mean_token_accuracy": 0.8742413520812988,
425
+ "step": 260
426
+ },
427
+ {
428
+ "epoch": 0.569280343716434,
429
+ "grad_norm": 0.2451922230157493,
430
+ "learning_rate": 4.9753168895524136e-05,
431
+ "loss": 0.4202,
432
+ "mean_token_accuracy": 0.8772394955158234,
433
+ "step": 265
434
+ },
435
+ {
436
+ "epoch": 0.5800214822771214,
437
+ "grad_norm": 0.2681975618828881,
438
+ "learning_rate": 4.9721002474579285e-05,
439
+ "loss": 0.4265,
440
+ "mean_token_accuracy": 0.8758379638195037,
441
+ "step": 270
442
+ },
443
+ {
444
+ "epoch": 0.5907626208378088,
445
+ "grad_norm": 0.22840035689897775,
446
+ "learning_rate": 4.968687951103638e-05,
447
+ "loss": 0.4209,
448
+ "mean_token_accuracy": 0.8775071561336517,
449
+ "step": 275
450
+ },
451
+ {
452
+ "epoch": 0.6015037593984962,
453
+ "grad_norm": 0.22300755601220718,
454
+ "learning_rate": 4.965080300939675e-05,
455
+ "loss": 0.4153,
456
+ "mean_token_accuracy": 0.8784702062606812,
457
+ "step": 280
458
+ },
459
+ {
460
+ "epoch": 0.6122448979591837,
461
+ "grad_norm": 0.22676783176605783,
462
+ "learning_rate": 4.961277614616931e-05,
463
+ "loss": 0.4168,
464
+ "mean_token_accuracy": 0.8779775381088257,
465
+ "step": 285
466
+ },
467
+ {
468
+ "epoch": 0.6229860365198711,
469
+ "grad_norm": 0.24574274186354764,
470
+ "learning_rate": 4.957280226959083e-05,
471
+ "loss": 0.4119,
472
+ "mean_token_accuracy": 0.8798301517963409,
473
+ "step": 290
474
+ },
475
+ {
476
+ "epoch": 0.6337271750805585,
477
+ "grad_norm": 0.2281072685520932,
478
+ "learning_rate": 4.953088489933117e-05,
479
+ "loss": 0.4176,
480
+ "mean_token_accuracy": 0.878108823299408,
481
+ "step": 295
482
+ },
483
+ {
484
+ "epoch": 0.644468313641246,
485
+ "grad_norm": 0.2606268344040068,
486
+ "learning_rate": 4.948702772618332e-05,
487
+ "loss": 0.4114,
488
+ "mean_token_accuracy": 0.879868882894516,
489
+ "step": 300
490
+ },
491
+ {
492
+ "epoch": 0.6552094522019334,
493
+ "grad_norm": 0.2192902541038699,
494
+ "learning_rate": 4.944123461173849e-05,
495
+ "loss": 0.4141,
496
+ "mean_token_accuracy": 0.879179573059082,
497
+ "step": 305
498
+ },
499
+ {
500
+ "epoch": 0.6659505907626209,
501
+ "grad_norm": 0.21550855803478997,
502
+ "learning_rate": 4.9393509588046036e-05,
503
+ "loss": 0.4053,
504
+ "mean_token_accuracy": 0.8814833164215088,
505
+ "step": 310
506
+ },
507
+ {
508
+ "epoch": 0.6766917293233082,
509
+ "grad_norm": 0.23830421980148422,
510
+ "learning_rate": 4.934385685725851e-05,
511
+ "loss": 0.4068,
512
+ "mean_token_accuracy": 0.8807245373725892,
513
+ "step": 315
514
+ },
515
+ {
516
+ "epoch": 0.6874328678839957,
517
+ "grad_norm": 0.22141238716961,
518
+ "learning_rate": 4.9292280791261595e-05,
519
+ "loss": 0.4023,
520
+ "mean_token_accuracy": 0.8820916056632996,
521
+ "step": 320
522
+ },
523
+ {
524
+ "epoch": 0.6981740064446831,
525
+ "grad_norm": 0.23798938808653466,
526
+ "learning_rate": 4.9238785931289225e-05,
527
+ "loss": 0.4042,
528
+ "mean_token_accuracy": 0.882178908586502,
529
+ "step": 325
530
+ },
531
+ {
532
+ "epoch": 0.7089151450053706,
533
+ "grad_norm": 0.22152782163874513,
534
+ "learning_rate": 4.918337698752367e-05,
535
+ "loss": 0.4038,
536
+ "mean_token_accuracy": 0.8820820569992065,
537
+ "step": 330
538
+ },
539
+ {
540
+ "epoch": 0.719656283566058,
541
+ "grad_norm": 0.2238393672437065,
542
+ "learning_rate": 4.912605883868088e-05,
543
+ "loss": 0.4094,
544
+ "mean_token_accuracy": 0.8803297877311707,
545
+ "step": 335
546
+ },
547
+ {
548
+ "epoch": 0.7303974221267454,
549
+ "grad_norm": 0.2251835579056735,
550
+ "learning_rate": 4.906683653158086e-05,
551
+ "loss": 0.4022,
552
+ "mean_token_accuracy": 0.8820242047309875,
553
+ "step": 340
554
+ },
555
+ {
556
+ "epoch": 0.7411385606874329,
557
+ "grad_norm": 0.21096516273893903,
558
+ "learning_rate": 4.9005715280703295e-05,
559
+ "loss": 0.3963,
560
+ "mean_token_accuracy": 0.8838990330696106,
561
+ "step": 345
562
+ },
563
+ {
564
+ "epoch": 0.7518796992481203,
565
+ "grad_norm": 0.20550443098708907,
566
+ "learning_rate": 4.8942700467728505e-05,
567
+ "loss": 0.3955,
568
+ "mean_token_accuracy": 0.8842245638370514,
569
+ "step": 350
570
+ },
571
+ {
572
+ "epoch": 0.7626208378088077,
573
+ "grad_norm": 0.2058867389466749,
574
+ "learning_rate": 4.88777976410635e-05,
575
+ "loss": 0.3995,
576
+ "mean_token_accuracy": 0.8830176711082458,
577
+ "step": 355
578
+ },
579
+ {
580
+ "epoch": 0.7733619763694952,
581
+ "grad_norm": 0.20958669116131587,
582
+ "learning_rate": 4.8811012515353456e-05,
583
+ "loss": 0.3911,
584
+ "mean_token_accuracy": 0.8853914678096771,
585
+ "step": 360
586
+ },
587
+ {
588
+ "epoch": 0.7841031149301826,
589
+ "grad_norm": 0.20397609182823062,
590
+ "learning_rate": 4.874235097097861e-05,
591
+ "loss": 0.393,
592
+ "mean_token_accuracy": 0.8846873760223388,
593
+ "step": 365
594
+ },
595
+ {
596
+ "epoch": 0.7948442534908701,
597
+ "grad_norm": 0.21645535614809533,
598
+ "learning_rate": 4.8671819053536415e-05,
599
+ "loss": 0.3922,
600
+ "mean_token_accuracy": 0.8847495734691619,
601
+ "step": 370
602
+ },
603
+ {
604
+ "epoch": 0.8055853920515574,
605
+ "grad_norm": 0.22258952481615085,
606
+ "learning_rate": 4.859942297330932e-05,
607
+ "loss": 0.3982,
608
+ "mean_token_accuracy": 0.8832435965538025,
609
+ "step": 375
610
+ },
611
+ {
612
+ "epoch": 0.8163265306122449,
613
+ "grad_norm": 0.2024612867389681,
614
+ "learning_rate": 4.8525169104717846e-05,
615
+ "loss": 0.3903,
616
+ "mean_token_accuracy": 0.8853883922100068,
617
+ "step": 380
618
+ },
619
+ {
620
+ "epoch": 0.8270676691729323,
621
+ "grad_norm": 0.20556087856635372,
622
+ "learning_rate": 4.844906398575944e-05,
623
+ "loss": 0.3964,
624
+ "mean_token_accuracy": 0.8837718069553375,
625
+ "step": 385
626
+ },
627
+ {
628
+ "epoch": 0.8378088077336198,
629
+ "grad_norm": 0.20809549331239957,
630
+ "learning_rate": 4.8371114317432726e-05,
631
+ "loss": 0.3941,
632
+ "mean_token_accuracy": 0.8842520952224732,
633
+ "step": 390
634
+ },
635
+ {
636
+ "epoch": 0.8485499462943072,
637
+ "grad_norm": 0.21820552680801697,
638
+ "learning_rate": 4.8291326963147524e-05,
639
+ "loss": 0.3891,
640
+ "mean_token_accuracy": 0.8858624398708344,
641
+ "step": 395
642
+ },
643
+ {
644
+ "epoch": 0.8592910848549946,
645
+ "grad_norm": 0.20709264624327767,
646
+ "learning_rate": 4.820970894812053e-05,
647
+ "loss": 0.3845,
648
+ "mean_token_accuracy": 0.886957323551178,
649
+ "step": 400
650
+ },
651
+ {
652
+ "epoch": 0.8700322234156821,
653
+ "grad_norm": 0.21155796049345174,
654
+ "learning_rate": 4.812626745875673e-05,
655
+ "loss": 0.3909,
656
+ "mean_token_accuracy": 0.8852347731590271,
657
+ "step": 405
658
+ },
659
+ {
660
+ "epoch": 0.8807733619763695,
661
+ "grad_norm": 0.20230194258239817,
662
+ "learning_rate": 4.804100984201667e-05,
663
+ "loss": 0.3888,
664
+ "mean_token_accuracy": 0.8856496810913086,
665
+ "step": 410
666
+ },
667
+ {
668
+ "epoch": 0.8915145005370569,
669
+ "grad_norm": 0.1914371442320018,
670
+ "learning_rate": 4.795394360476955e-05,
671
+ "loss": 0.3927,
672
+ "mean_token_accuracy": 0.885220056772232,
673
+ "step": 415
674
+ },
675
+ {
676
+ "epoch": 0.9022556390977443,
677
+ "grad_norm": 0.21955921021321853,
678
+ "learning_rate": 4.7865076413132234e-05,
679
+ "loss": 0.3862,
680
+ "mean_token_accuracy": 0.8869829177856445,
681
+ "step": 420
682
+ },
683
+ {
684
+ "epoch": 0.9129967776584318,
685
+ "grad_norm": 0.19993088700133185,
686
+ "learning_rate": 4.777441609179428e-05,
687
+ "loss": 0.389,
688
+ "mean_token_accuracy": 0.8861649572849274,
689
+ "step": 425
690
+ },
691
+ {
692
+ "epoch": 0.9237379162191193,
693
+ "grad_norm": 0.20214442771764315,
694
+ "learning_rate": 4.768197062332898e-05,
695
+ "loss": 0.3805,
696
+ "mean_token_accuracy": 0.8884122192859649,
697
+ "step": 430
698
+ },
699
+ {
700
+ "epoch": 0.9344790547798066,
701
+ "grad_norm": 0.1936799045011743,
702
+ "learning_rate": 4.758774814749046e-05,
703
+ "loss": 0.3825,
704
+ "mean_token_accuracy": 0.8876857936382294,
705
+ "step": 435
706
+ },
707
+ {
708
+ "epoch": 0.9452201933404941,
709
+ "grad_norm": 0.19325903425845148,
710
+ "learning_rate": 4.749175696049706e-05,
711
+ "loss": 0.3826,
712
+ "mean_token_accuracy": 0.8881516516208648,
713
+ "step": 440
714
+ },
715
+ {
716
+ "epoch": 0.9559613319011815,
717
+ "grad_norm": 0.19255187762230458,
718
+ "learning_rate": 4.739400551430077e-05,
719
+ "loss": 0.3811,
720
+ "mean_token_accuracy": 0.8880790531635284,
721
+ "step": 445
722
+ },
723
+ {
724
+ "epoch": 0.966702470461869,
725
+ "grad_norm": 0.19450067956842618,
726
+ "learning_rate": 4.7294502415843105e-05,
727
+ "loss": 0.3783,
728
+ "mean_token_accuracy": 0.8890111207962036,
729
+ "step": 450
730
+ },
731
+ {
732
+ "epoch": 0.9774436090225563,
733
+ "grad_norm": 0.20174438790639918,
734
+ "learning_rate": 4.719325642629722e-05,
735
+ "loss": 0.378,
736
+ "mean_token_accuracy": 0.8890378654003144,
737
+ "step": 455
738
+ },
739
+ {
740
+ "epoch": 0.9881847475832438,
741
+ "grad_norm": 0.17832896478111976,
742
+ "learning_rate": 4.7090276460296555e-05,
743
+ "loss": 0.3843,
744
+ "mean_token_accuracy": 0.8872815728187561,
745
+ "step": 460
746
+ },
747
+ {
748
+ "epoch": 0.9989258861439313,
749
+ "grad_norm": 0.1913931630832869,
750
+ "learning_rate": 4.6985571585149876e-05,
751
+ "loss": 0.3796,
752
+ "mean_token_accuracy": 0.8887166023254395,
753
+ "step": 465
754
+ },
755
+ {
756
+ "epoch": 1.0085929108485499,
757
+ "grad_norm": 0.20263869484120534,
758
+ "learning_rate": 4.687915102004286e-05,
759
+ "loss": 0.3614,
760
+ "mean_token_accuracy": 0.8926012317339579,
761
+ "step": 470
762
+ },
763
+ {
764
+ "epoch": 1.0193340494092373,
765
+ "grad_norm": 0.19678722825673817,
766
+ "learning_rate": 4.677102413522645e-05,
767
+ "loss": 0.3495,
768
+ "mean_token_accuracy": 0.8955722391605377,
769
+ "step": 475
770
+ },
771
+ {
772
+ "epoch": 1.0300751879699248,
773
+ "grad_norm": 0.20376503491728473,
774
+ "learning_rate": 4.666120045119174e-05,
775
+ "loss": 0.3507,
776
+ "mean_token_accuracy": 0.8951772391796112,
777
+ "step": 480
778
+ },
779
+ {
780
+ "epoch": 1.0408163265306123,
781
+ "grad_norm": 0.2019062903436488,
782
+ "learning_rate": 4.654968963783171e-05,
783
+ "loss": 0.3531,
784
+ "mean_token_accuracy": 0.8947476446628571,
785
+ "step": 485
786
+ },
787
+ {
788
+ "epoch": 1.0515574650912998,
789
+ "grad_norm": 0.18722603018624961,
790
+ "learning_rate": 4.643650151358983e-05,
791
+ "loss": 0.3526,
792
+ "mean_token_accuracy": 0.894485878944397,
793
+ "step": 490
794
+ },
795
+ {
796
+ "epoch": 1.062298603651987,
797
+ "grad_norm": 0.19481656873843595,
798
+ "learning_rate": 4.632164604459553e-05,
799
+ "loss": 0.3468,
800
+ "mean_token_accuracy": 0.8964617013931274,
801
+ "step": 495
802
+ },
803
+ {
804
+ "epoch": 1.0730397422126745,
805
+ "grad_norm": 0.18585853331072713,
806
+ "learning_rate": 4.620513334378669e-05,
807
+ "loss": 0.3512,
808
+ "mean_token_accuracy": 0.8950131058692932,
809
+ "step": 500
810
+ },
811
+ {
812
+ "epoch": 1.083780880773362,
813
+ "grad_norm": 0.1930388596228489,
814
+ "learning_rate": 4.608697367001921e-05,
815
+ "loss": 0.3479,
816
+ "mean_token_accuracy": 0.895933198928833,
817
+ "step": 505
818
+ },
819
+ {
820
+ "epoch": 1.0945220193340495,
821
+ "grad_norm": 0.1978189680563173,
822
+ "learning_rate": 4.596717742716372e-05,
823
+ "loss": 0.3532,
824
+ "mean_token_accuracy": 0.8942179441452026,
825
+ "step": 510
826
+ },
827
+ {
828
+ "epoch": 1.1052631578947367,
829
+ "grad_norm": 0.2198969141563894,
830
+ "learning_rate": 4.584575516318954e-05,
831
+ "loss": 0.3492,
832
+ "mean_token_accuracy": 0.8957188785076141,
833
+ "step": 515
834
+ },
835
+ {
836
+ "epoch": 1.1160042964554242,
837
+ "grad_norm": 0.19175977623621587,
838
+ "learning_rate": 4.5722717569235924e-05,
839
+ "loss": 0.3553,
840
+ "mean_token_accuracy": 0.8938140749931336,
841
+ "step": 520
842
+ },
843
+ {
844
+ "epoch": 1.1267454350161117,
845
+ "grad_norm": 0.1995625771811619,
846
+ "learning_rate": 4.559807547867071e-05,
847
+ "loss": 0.3493,
848
+ "mean_token_accuracy": 0.8954446971416473,
849
+ "step": 525
850
+ },
851
+ {
852
+ "epoch": 1.1374865735767992,
853
+ "grad_norm": 0.1915734911527379,
854
+ "learning_rate": 4.5471839866136475e-05,
855
+ "loss": 0.3491,
856
+ "mean_token_accuracy": 0.8957653522491456,
857
+ "step": 530
858
+ },
859
+ {
860
+ "epoch": 1.1482277121374866,
861
+ "grad_norm": 0.19836797519712018,
862
+ "learning_rate": 4.5344021846584205e-05,
863
+ "loss": 0.3539,
864
+ "mean_token_accuracy": 0.8943828701972961,
865
+ "step": 535
866
+ },
867
+ {
868
+ "epoch": 1.158968850698174,
869
+ "grad_norm": 0.18808462761740152,
870
+ "learning_rate": 4.521463267429464e-05,
871
+ "loss": 0.3497,
872
+ "mean_token_accuracy": 0.8953365862369538,
873
+ "step": 540
874
+ },
875
+ {
876
+ "epoch": 1.1697099892588614,
877
+ "grad_norm": 0.19280122016496182,
878
+ "learning_rate": 4.508368374188731e-05,
879
+ "loss": 0.3496,
880
+ "mean_token_accuracy": 0.8953313529491425,
881
+ "step": 545
882
+ },
883
+ {
884
+ "epoch": 1.1804511278195489,
885
+ "grad_norm": 0.19677371481260625,
886
+ "learning_rate": 4.4951186579317504e-05,
887
+ "loss": 0.3528,
888
+ "mean_token_accuracy": 0.8949146151542664,
889
+ "step": 550
890
+ },
891
+ {
892
+ "epoch": 1.1911922663802363,
893
+ "grad_norm": 0.18538032977972374,
894
+ "learning_rate": 4.481715285286098e-05,
895
+ "loss": 0.3541,
896
+ "mean_token_accuracy": 0.8939870595932007,
897
+ "step": 555
898
+ },
899
+ {
900
+ "epoch": 1.2019334049409238,
901
+ "grad_norm": 0.18481539602601102,
902
+ "learning_rate": 4.46815943640868e-05,
903
+ "loss": 0.3553,
904
+ "mean_token_accuracy": 0.8940768420696259,
905
+ "step": 560
906
+ },
907
+ {
908
+ "epoch": 1.212674543501611,
909
+ "grad_norm": 0.1861386211911988,
910
+ "learning_rate": 4.454452304881821e-05,
911
+ "loss": 0.3468,
912
+ "mean_token_accuracy": 0.8959418594837188,
913
+ "step": 565
914
+ },
915
+ {
916
+ "epoch": 1.2234156820622986,
917
+ "grad_norm": 0.18228266310501318,
918
+ "learning_rate": 4.440595097608168e-05,
919
+ "loss": 0.3467,
920
+ "mean_token_accuracy": 0.8962770164012909,
921
+ "step": 570
922
+ },
923
+ {
924
+ "epoch": 1.234156820622986,
925
+ "grad_norm": 0.1841361210717962,
926
+ "learning_rate": 4.426589034704428e-05,
927
+ "loss": 0.3536,
928
+ "mean_token_accuracy": 0.8943024933338165,
929
+ "step": 575
930
+ },
931
+ {
932
+ "epoch": 1.2448979591836735,
933
+ "grad_norm": 0.17281724579297167,
934
+ "learning_rate": 4.412435349393931e-05,
935
+ "loss": 0.3509,
936
+ "mean_token_accuracy": 0.8950875043869019,
937
+ "step": 580
938
+ },
939
+ {
940
+ "epoch": 1.255639097744361,
941
+ "grad_norm": 0.1772300668593227,
942
+ "learning_rate": 4.398135287898052e-05,
943
+ "loss": 0.3485,
944
+ "mean_token_accuracy": 0.8955003321170807,
945
+ "step": 585
946
+ },
947
+ {
948
+ "epoch": 1.2663802363050483,
949
+ "grad_norm": 0.17772581177798846,
950
+ "learning_rate": 4.383690109326477e-05,
951
+ "loss": 0.3459,
952
+ "mean_token_accuracy": 0.8965889751911164,
953
+ "step": 590
954
+ },
955
+ {
956
+ "epoch": 1.2771213748657357,
957
+ "grad_norm": 0.18596059716645308,
958
+ "learning_rate": 4.369101085566342e-05,
959
+ "loss": 0.3496,
960
+ "mean_token_accuracy": 0.8954894125461579,
961
+ "step": 595
962
+ },
963
+ {
964
+ "epoch": 1.2878625134264232,
965
+ "grad_norm": 0.17598132780016223,
966
+ "learning_rate": 4.354369501170246e-05,
967
+ "loss": 0.3479,
968
+ "mean_token_accuracy": 0.8960169315338135,
969
+ "step": 600
970
+ },
971
+ {
972
+ "epoch": 1.2986036519871107,
973
+ "grad_norm": 0.1804871594490513,
974
+ "learning_rate": 4.3394966532431433e-05,
975
+ "loss": 0.352,
976
+ "mean_token_accuracy": 0.8948932409286499,
977
+ "step": 605
978
+ },
979
+ {
980
+ "epoch": 1.3093447905477982,
981
+ "grad_norm": 0.1865297212423964,
982
+ "learning_rate": 4.3244838513281367e-05,
983
+ "loss": 0.3515,
984
+ "mean_token_accuracy": 0.8949047923088074,
985
+ "step": 610
986
+ },
987
+ {
988
+ "epoch": 1.3200859291084854,
989
+ "grad_norm": 0.18053270547327416,
990
+ "learning_rate": 4.309332417291172e-05,
991
+ "loss": 0.3505,
992
+ "mean_token_accuracy": 0.8953122675418854,
993
+ "step": 615
994
+ },
995
+ {
996
+ "epoch": 1.330827067669173,
997
+ "grad_norm": 0.1744036148367508,
998
+ "learning_rate": 4.294043685204651e-05,
999
+ "loss": 0.3474,
1000
+ "mean_token_accuracy": 0.8960575997829437,
1001
+ "step": 620
1002
+ },
1003
+ {
1004
+ "epoch": 1.3415682062298604,
1005
+ "grad_norm": 0.16842924897825143,
1006
+ "learning_rate": 4.278619001229962e-05,
1007
+ "loss": 0.3474,
1008
+ "mean_token_accuracy": 0.8961166024208069,
1009
+ "step": 625
1010
+ },
1011
+ {
1012
+ "epoch": 1.3523093447905479,
1013
+ "grad_norm": 0.17741079904542595,
1014
+ "learning_rate": 4.263059723498961e-05,
1015
+ "loss": 0.3474,
1016
+ "mean_token_accuracy": 0.8962021231651306,
1017
+ "step": 630
1018
+ },
1019
+ {
1020
+ "epoch": 1.3630504833512354,
1021
+ "grad_norm": 0.17634563486082044,
1022
+ "learning_rate": 4.247367221994377e-05,
1023
+ "loss": 0.352,
1024
+ "mean_token_accuracy": 0.8948638260364532,
1025
+ "step": 635
1026
+ },
1027
+ {
1028
+ "epoch": 1.3737916219119226,
1029
+ "grad_norm": 0.16514936818638581,
1030
+ "learning_rate": 4.2315428784291965e-05,
1031
+ "loss": 0.348,
1032
+ "mean_token_accuracy": 0.8962691247463226,
1033
+ "step": 640
1034
+ },
1035
+ {
1036
+ "epoch": 1.38453276047261,
1037
+ "grad_norm": 0.18156198450594868,
1038
+ "learning_rate": 4.215588086125001e-05,
1039
+ "loss": 0.3473,
1040
+ "mean_token_accuracy": 0.8962475776672363,
1041
+ "step": 645
1042
+ },
1043
+ {
1044
+ "epoch": 1.3952738990332976,
1045
+ "grad_norm": 0.17302374962454448,
1046
+ "learning_rate": 4.199504249889279e-05,
1047
+ "loss": 0.3499,
1048
+ "mean_token_accuracy": 0.8956164479255676,
1049
+ "step": 650
1050
+ },
1051
+ {
1052
+ "epoch": 1.4060150375939848,
1053
+ "grad_norm": 0.17009271559786848,
1054
+ "learning_rate": 4.18329278589175e-05,
1055
+ "loss": 0.3481,
1056
+ "mean_token_accuracy": 0.8962275862693787,
1057
+ "step": 655
1058
+ },
1059
+ {
1060
+ "epoch": 1.4167561761546725,
1061
+ "grad_norm": 0.17232579890547844,
1062
+ "learning_rate": 4.166955121539656e-05,
1063
+ "loss": 0.3452,
1064
+ "mean_token_accuracy": 0.8966892838478089,
1065
+ "step": 660
1066
+ },
1067
+ {
1068
+ "epoch": 1.4274973147153598,
1069
+ "grad_norm": 0.18931912307479049,
1070
+ "learning_rate": 4.150492695352086e-05,
1071
+ "loss": 0.3476,
1072
+ "mean_token_accuracy": 0.8961862683296203,
1073
+ "step": 665
1074
+ },
1075
+ {
1076
+ "epoch": 1.4382384532760473,
1077
+ "grad_norm": 0.1812257587896816,
1078
+ "learning_rate": 4.133906956833316e-05,
1079
+ "loss": 0.3451,
1080
+ "mean_token_accuracy": 0.8965191125869751,
1081
+ "step": 670
1082
+ },
1083
+ {
1084
+ "epoch": 1.4489795918367347,
1085
+ "grad_norm": 0.18448866093949617,
1086
+ "learning_rate": 4.1171993663451816e-05,
1087
+ "loss": 0.3453,
1088
+ "mean_token_accuracy": 0.8967220306396484,
1089
+ "step": 675
1090
+ },
1091
+ {
1092
+ "epoch": 1.459720730397422,
1093
+ "grad_norm": 0.16318177527247005,
1094
+ "learning_rate": 4.1003713949784905e-05,
1095
+ "loss": 0.3491,
1096
+ "mean_token_accuracy": 0.8957133948802948,
1097
+ "step": 680
1098
+ },
1099
+ {
1100
+ "epoch": 1.4704618689581095,
1101
+ "grad_norm": 0.19223128076002124,
1102
+ "learning_rate": 4.083424524423498e-05,
1103
+ "loss": 0.3475,
1104
+ "mean_token_accuracy": 0.8962952673435212,
1105
+ "step": 685
1106
+ },
1107
+ {
1108
+ "epoch": 1.481203007518797,
1109
+ "grad_norm": 0.17065645296533696,
1110
+ "learning_rate": 4.066360246839442e-05,
1111
+ "loss": 0.3495,
1112
+ "mean_token_accuracy": 0.8956079244613647,
1113
+ "step": 690
1114
+ },
1115
+ {
1116
+ "epoch": 1.4919441460794844,
1117
+ "grad_norm": 0.1613801844631258,
1118
+ "learning_rate": 4.049180064723164e-05,
1119
+ "loss": 0.3491,
1120
+ "mean_token_accuracy": 0.8964253485202789,
1121
+ "step": 695
1122
+ },
1123
+ {
1124
+ "epoch": 1.502685284640172,
1125
+ "grad_norm": 0.17729165960730092,
1126
+ "learning_rate": 4.031885490776811e-05,
1127
+ "loss": 0.3461,
1128
+ "mean_token_accuracy": 0.8965683281421661,
1129
+ "step": 700
1130
+ },
1131
+ {
1132
+ "epoch": 1.5134264232008592,
1133
+ "grad_norm": 0.16772417608227957,
1134
+ "learning_rate": 4.014478047774644e-05,
1135
+ "loss": 0.3486,
1136
+ "mean_token_accuracy": 0.8959019482135773,
1137
+ "step": 705
1138
+ },
1139
+ {
1140
+ "epoch": 1.5241675617615469,
1141
+ "grad_norm": 0.1654092742061062,
1142
+ "learning_rate": 3.99695926842896e-05,
1143
+ "loss": 0.3452,
1144
+ "mean_token_accuracy": 0.8970151007175445,
1145
+ "step": 710
1146
+ },
1147
+ {
1148
+ "epoch": 1.5349087003222341,
1149
+ "grad_norm": 0.1770663143483711,
1150
+ "learning_rate": 3.979330695255139e-05,
1151
+ "loss": 0.3504,
1152
+ "mean_token_accuracy": 0.8954713106155395,
1153
+ "step": 715
1154
+ },
1155
+ {
1156
+ "epoch": 1.5456498388829216,
1157
+ "grad_norm": 0.16250407421180885,
1158
+ "learning_rate": 3.9615938804358254e-05,
1159
+ "loss": 0.3403,
1160
+ "mean_token_accuracy": 0.8980903148651123,
1161
+ "step": 720
1162
+ },
1163
+ {
1164
+ "epoch": 1.556390977443609,
1165
+ "grad_norm": 0.1739734421973896,
1166
+ "learning_rate": 3.943750385684257e-05,
1167
+ "loss": 0.3452,
1168
+ "mean_token_accuracy": 0.8973391890525818,
1169
+ "step": 725
1170
+ },
1171
+ {
1172
+ "epoch": 1.5671321160042964,
1173
+ "grad_norm": 0.17020682906702797,
1174
+ "learning_rate": 3.9258017821067595e-05,
1175
+ "loss": 0.341,
1176
+ "mean_token_accuracy": 0.8981746196746826,
1177
+ "step": 730
1178
+ },
1179
+ {
1180
+ "epoch": 1.5778732545649838,
1181
+ "grad_norm": 0.17090518777542177,
1182
+ "learning_rate": 3.907749650064416e-05,
1183
+ "loss": 0.3475,
1184
+ "mean_token_accuracy": 0.8964370787143707,
1185
+ "step": 735
1186
+ },
1187
+ {
1188
+ "epoch": 1.5886143931256713,
1189
+ "grad_norm": 0.18226436070710383,
1190
+ "learning_rate": 3.889595579033907e-05,
1191
+ "loss": 0.3548,
1192
+ "mean_token_accuracy": 0.8943204343318939,
1193
+ "step": 740
1194
+ },
1195
+ {
1196
+ "epoch": 1.5993555316863588,
1197
+ "grad_norm": 0.16867971152976394,
1198
+ "learning_rate": 3.8713411674675706e-05,
1199
+ "loss": 0.3468,
1200
+ "mean_token_accuracy": 0.8964660108089447,
1201
+ "step": 745
1202
+ },
1203
+ {
1204
+ "epoch": 1.6100966702470463,
1205
+ "grad_norm": 0.1634124661472663,
1206
+ "learning_rate": 3.8529880226526504e-05,
1207
+ "loss": 0.3419,
1208
+ "mean_token_accuracy": 0.897741311788559,
1209
+ "step": 750
1210
+ },
1211
+ {
1212
+ "epoch": 1.6208378088077335,
1213
+ "grad_norm": 0.16728119897984747,
1214
+ "learning_rate": 3.834537760569779e-05,
1215
+ "loss": 0.3477,
1216
+ "mean_token_accuracy": 0.8964338660240173,
1217
+ "step": 755
1218
+ },
1219
+ {
1220
+ "epoch": 1.631578947368421,
1221
+ "grad_norm": 0.16636899767836238,
1222
+ "learning_rate": 3.815992005750691e-05,
1223
+ "loss": 0.3454,
1224
+ "mean_token_accuracy": 0.897176194190979,
1225
+ "step": 760
1226
+ },
1227
+ {
1228
+ "epoch": 1.6423200859291085,
1229
+ "grad_norm": 0.17370655470517776,
1230
+ "learning_rate": 3.7973523911351873e-05,
1231
+ "loss": 0.3457,
1232
+ "mean_token_accuracy": 0.8967864811420441,
1233
+ "step": 765
1234
+ },
1235
+ {
1236
+ "epoch": 1.6530612244897958,
1237
+ "grad_norm": 0.17387140846382934,
1238
+ "learning_rate": 3.7786205579273494e-05,
1239
+ "loss": 0.3461,
1240
+ "mean_token_accuracy": 0.896539443731308,
1241
+ "step": 770
1242
+ },
1243
+ {
1244
+ "epoch": 1.6638023630504835,
1245
+ "grad_norm": 0.17312244395133694,
1246
+ "learning_rate": 3.75979815545104e-05,
1247
+ "loss": 0.3469,
1248
+ "mean_token_accuracy": 0.8965823531150818,
1249
+ "step": 775
1250
+ },
1251
+ {
1252
+ "epoch": 1.6745435016111707,
1253
+ "grad_norm": 0.17134683681288093,
1254
+ "learning_rate": 3.740886841004678e-05,
1255
+ "loss": 0.3437,
1256
+ "mean_token_accuracy": 0.8972635090351104,
1257
+ "step": 780
1258
+ },
1259
+ {
1260
+ "epoch": 1.6852846401718582,
1261
+ "grad_norm": 0.1703220892784228,
1262
+ "learning_rate": 3.72188827971531e-05,
1263
+ "loss": 0.349,
1264
+ "mean_token_accuracy": 0.8958061695098877,
1265
+ "step": 785
1266
+ },
1267
+ {
1268
+ "epoch": 1.6960257787325457,
1269
+ "grad_norm": 0.15629690421483755,
1270
+ "learning_rate": 3.7028041443920106e-05,
1271
+ "loss": 0.345,
1272
+ "mean_token_accuracy": 0.8972305715084076,
1273
+ "step": 790
1274
+ },
1275
+ {
1276
+ "epoch": 1.706766917293233,
1277
+ "grad_norm": 0.16968855316404596,
1278
+ "learning_rate": 3.6836361153785735e-05,
1279
+ "loss": 0.3391,
1280
+ "mean_token_accuracy": 0.8984034955501556,
1281
+ "step": 795
1282
+ },
1283
+ {
1284
+ "epoch": 1.7175080558539206,
1285
+ "grad_norm": 0.1613956545932139,
1286
+ "learning_rate": 3.6643858804055764e-05,
1287
+ "loss": 0.3418,
1288
+ "mean_token_accuracy": 0.8975095868110656,
1289
+ "step": 800
1290
+ },
1291
+ {
1292
+ "epoch": 1.728249194414608,
1293
+ "grad_norm": 0.16488649273144998,
1294
+ "learning_rate": 3.6450551344417656e-05,
1295
+ "loss": 0.347,
1296
+ "mean_token_accuracy": 0.8963462889194489,
1297
+ "step": 805
1298
+ },
1299
+ {
1300
+ "epoch": 1.7389903329752954,
1301
+ "grad_norm": 0.18336562912600562,
1302
+ "learning_rate": 3.625645579544824e-05,
1303
+ "loss": 0.3417,
1304
+ "mean_token_accuracy": 0.8978760004043579,
1305
+ "step": 810
1306
+ },
1307
+ {
1308
+ "epoch": 1.7497314715359829,
1309
+ "grad_norm": 0.16442030655020706,
1310
+ "learning_rate": 3.606158924711498e-05,
1311
+ "loss": 0.3418,
1312
+ "mean_token_accuracy": 0.8984208166599273,
1313
+ "step": 815
1314
+ },
1315
+ {
1316
+ "epoch": 1.76047261009667,
1317
+ "grad_norm": 0.1648466060868627,
1318
+ "learning_rate": 3.586596885727126e-05,
1319
+ "loss": 0.346,
1320
+ "mean_token_accuracy": 0.8967172205448151,
1321
+ "step": 820
1322
+ },
1323
+ {
1324
+ "epoch": 1.7712137486573578,
1325
+ "grad_norm": 0.16380950472689287,
1326
+ "learning_rate": 3.5669611850145676e-05,
1327
+ "loss": 0.3404,
1328
+ "mean_token_accuracy": 0.8981300175189972,
1329
+ "step": 825
1330
+ },
1331
+ {
1332
+ "epoch": 1.781954887218045,
1333
+ "grad_norm": 0.16476649720519732,
1334
+ "learning_rate": 3.54725355148254e-05,
1335
+ "loss": 0.3417,
1336
+ "mean_token_accuracy": 0.8978650271892548,
1337
+ "step": 830
1338
+ },
1339
+ {
1340
+ "epoch": 1.7926960257787325,
1341
+ "grad_norm": 0.16250342083791575,
1342
+ "learning_rate": 3.5274757203733906e-05,
1343
+ "loss": 0.3429,
1344
+ "mean_token_accuracy": 0.8977679431438446,
1345
+ "step": 835
1346
+ },
1347
+ {
1348
+ "epoch": 1.80343716433942,
1349
+ "grad_norm": 0.1666333005283665,
1350
+ "learning_rate": 3.507629433110311e-05,
1351
+ "loss": 0.3437,
1352
+ "mean_token_accuracy": 0.8972832322120666,
1353
+ "step": 840
1354
+ },
1355
+ {
1356
+ "epoch": 1.8141783029001073,
1357
+ "grad_norm": 0.1615387362712691,
1358
+ "learning_rate": 3.4877164371440075e-05,
1359
+ "loss": 0.3453,
1360
+ "mean_token_accuracy": 0.8970289349555969,
1361
+ "step": 845
1362
+ },
1363
+ {
1364
+ "epoch": 1.824919441460795,
1365
+ "grad_norm": 0.16676447906725542,
1366
+ "learning_rate": 3.467738485798836e-05,
1367
+ "loss": 0.3451,
1368
+ "mean_token_accuracy": 0.8969220995903016,
1369
+ "step": 850
1370
+ },
1371
+ {
1372
+ "epoch": 1.8356605800214822,
1373
+ "grad_norm": 0.16168843045380168,
1374
+ "learning_rate": 3.447697338118425e-05,
1375
+ "loss": 0.3395,
1376
+ "mean_token_accuracy": 0.898131811618805,
1377
+ "step": 855
1378
+ },
1379
+ {
1380
+ "epoch": 1.8464017185821697,
1381
+ "grad_norm": 0.15334942056157058,
1382
+ "learning_rate": 3.427594758710794e-05,
1383
+ "loss": 0.3422,
1384
+ "mean_token_accuracy": 0.8975472927093506,
1385
+ "step": 860
1386
+ },
1387
+ {
1388
+ "epoch": 1.8571428571428572,
1389
+ "grad_norm": 0.1672358555124429,
1390
+ "learning_rate": 3.407432517592979e-05,
1391
+ "loss": 0.3403,
1392
+ "mean_token_accuracy": 0.8983366131782532,
1393
+ "step": 865
1394
+ },
1395
+ {
1396
+ "epoch": 1.8678839957035445,
1397
+ "grad_norm": 0.161941088262071,
1398
+ "learning_rate": 3.3872123900351835e-05,
1399
+ "loss": 0.3408,
1400
+ "mean_token_accuracy": 0.8978644967079162,
1401
+ "step": 870
1402
+ },
1403
+ {
1404
+ "epoch": 1.8786251342642322,
1405
+ "grad_norm": 0.1519842470665007,
1406
+ "learning_rate": 3.3669361564044735e-05,
1407
+ "loss": 0.3396,
1408
+ "mean_token_accuracy": 0.898490047454834,
1409
+ "step": 875
1410
+ },
1411
+ {
1412
+ "epoch": 1.8893662728249194,
1413
+ "grad_norm": 0.16037110333088753,
1414
+ "learning_rate": 3.346605602008007e-05,
1415
+ "loss": 0.3417,
1416
+ "mean_token_accuracy": 0.8977841079235077,
1417
+ "step": 880
1418
+ },
1419
+ {
1420
+ "epoch": 1.900107411385607,
1421
+ "grad_norm": 0.16442639618093918,
1422
+ "learning_rate": 3.326222516935847e-05,
1423
+ "loss": 0.3437,
1424
+ "mean_token_accuracy": 0.8971070289611817,
1425
+ "step": 885
1426
+ },
1427
+ {
1428
+ "epoch": 1.9108485499462944,
1429
+ "grad_norm": 0.15289173675825762,
1430
+ "learning_rate": 3.3057886959033426e-05,
1431
+ "loss": 0.3416,
1432
+ "mean_token_accuracy": 0.8984978437423706,
1433
+ "step": 890
1434
+ },
1435
+ {
1436
+ "epoch": 1.9215896885069816,
1437
+ "grad_norm": 0.14450841113047458,
1438
+ "learning_rate": 3.285305938093108e-05,
1439
+ "loss": 0.3392,
1440
+ "mean_token_accuracy": 0.8983058393001556,
1441
+ "step": 895
1442
+ },
1443
+ {
1444
+ "epoch": 1.9323308270676691,
1445
+ "grad_norm": 0.15549384924856993,
1446
+ "learning_rate": 3.264776046996602e-05,
1447
+ "loss": 0.3394,
1448
+ "mean_token_accuracy": 0.8985956251621247,
1449
+ "step": 900
1450
+ },
1451
+ {
1452
+ "epoch": 1.9430719656283566,
1453
+ "grad_norm": 0.162459823198956,
1454
+ "learning_rate": 3.2442008302553346e-05,
1455
+ "loss": 0.34,
1456
+ "mean_token_accuracy": 0.8984286248683929,
1457
+ "step": 905
1458
+ },
1459
+ {
1460
+ "epoch": 1.953813104189044,
1461
+ "grad_norm": 0.15039221824995944,
1462
+ "learning_rate": 3.223582099501704e-05,
1463
+ "loss": 0.3374,
1464
+ "mean_token_accuracy": 0.8987222969532013,
1465
+ "step": 910
1466
+ },
1467
+ {
1468
+ "epoch": 1.9645542427497316,
1469
+ "grad_norm": 0.1564002589458454,
1470
+ "learning_rate": 3.202921670199485e-05,
1471
+ "loss": 0.3369,
1472
+ "mean_token_accuracy": 0.8994980156421661,
1473
+ "step": 915
1474
+ },
1475
+ {
1476
+ "epoch": 1.9752953813104188,
1477
+ "grad_norm": 0.17459425481905663,
1478
+ "learning_rate": 3.182221361483981e-05,
1479
+ "loss": 0.3426,
1480
+ "mean_token_accuracy": 0.8977073311805726,
1481
+ "step": 920
1482
+ },
1483
+ {
1484
+ "epoch": 1.9860365198711063,
1485
+ "grad_norm": 0.15953782868809285,
1486
+ "learning_rate": 3.161482996001842e-05,
1487
+ "loss": 0.3406,
1488
+ "mean_token_accuracy": 0.8983509004116058,
1489
+ "step": 925
1490
+ },
1491
+ {
1492
+ "epoch": 1.9967776584317938,
1493
+ "grad_norm": 0.15713432539772912,
1494
+ "learning_rate": 3.140708399750594e-05,
1495
+ "loss": 0.3421,
1496
+ "mean_token_accuracy": 0.8979579448699951,
1497
+ "step": 930
1498
+ },
1499
+ {
1500
+ "epoch": 2.0064446831364124,
1501
+ "grad_norm": 0.16209947632099436,
1502
+ "learning_rate": 3.11989940191785e-05,
1503
+ "loss": 0.3137,
1504
+ "mean_token_accuracy": 0.9049130148357816,
1505
+ "step": 935
1506
+ },
1507
+ {
1508
+ "epoch": 2.0171858216970997,
1509
+ "grad_norm": 0.18807228831939848,
1510
+ "learning_rate": 3.09905783472026e-05,
1511
+ "loss": 0.305,
1512
+ "mean_token_accuracy": 0.9070174276828766,
1513
+ "step": 940
1514
+ },
1515
+ {
1516
+ "epoch": 2.0279269602577874,
1517
+ "grad_norm": 0.1647631068534088,
1518
+ "learning_rate": 3.07818553324218e-05,
1519
+ "loss": 0.3039,
1520
+ "mean_token_accuracy": 0.9071334481239319,
1521
+ "step": 945
1522
+ },
1523
+ {
1524
+ "epoch": 2.0386680988184747,
1525
+ "grad_norm": 0.16628057896853762,
1526
+ "learning_rate": 3.057284335274097e-05,
1527
+ "loss": 0.3026,
1528
+ "mean_token_accuracy": 0.9071128606796265,
1529
+ "step": 950
1530
+ },
1531
+ {
1532
+ "epoch": 2.0494092373791624,
1533
+ "grad_norm": 0.16953299184244167,
1534
+ "learning_rate": 3.036356081150813e-05,
1535
+ "loss": 0.3034,
1536
+ "mean_token_accuracy": 0.9072185814380646,
1537
+ "step": 955
1538
+ },
1539
+ {
1540
+ "epoch": 2.0601503759398496,
1541
+ "grad_norm": 0.16119678084859076,
1542
+ "learning_rate": 3.0154026135894043e-05,
1543
+ "loss": 0.2994,
1544
+ "mean_token_accuracy": 0.9083474159240723,
1545
+ "step": 960
1546
+ },
1547
+ {
1548
+ "epoch": 2.070891514500537,
1549
+ "grad_norm": 0.16680753647576305,
1550
+ "learning_rate": 2.9944257775269686e-05,
1551
+ "loss": 0.3046,
1552
+ "mean_token_accuracy": 0.9070303261280059,
1553
+ "step": 965
1554
+ },
1555
+ {
1556
+ "epoch": 2.0816326530612246,
1557
+ "grad_norm": 0.1557469947598615,
1558
+ "learning_rate": 2.9734274199581857e-05,
1559
+ "loss": 0.3028,
1560
+ "mean_token_accuracy": 0.9075248777866364,
1561
+ "step": 970
1562
+ },
1563
+ {
1564
+ "epoch": 2.092373791621912,
1565
+ "grad_norm": 0.15821336281763043,
1566
+ "learning_rate": 2.9524093897726875e-05,
1567
+ "loss": 0.2992,
1568
+ "mean_token_accuracy": 0.9085965514183044,
1569
+ "step": 975
1570
+ },
1571
+ {
1572
+ "epoch": 2.1031149301825995,
1573
+ "grad_norm": 0.16912179860419502,
1574
+ "learning_rate": 2.931373537592264e-05,
1575
+ "loss": 0.3059,
1576
+ "mean_token_accuracy": 0.9063934266567231,
1577
+ "step": 980
1578
+ },
1579
+ {
1580
+ "epoch": 2.113856068743287,
1581
+ "grad_norm": 0.1568909903521791,
1582
+ "learning_rate": 2.9103217156079183e-05,
1583
+ "loss": 0.3017,
1584
+ "mean_token_accuracy": 0.9079225361347198,
1585
+ "step": 985
1586
+ },
1587
+ {
1588
+ "epoch": 2.124597207303974,
1589
+ "grad_norm": 0.17149311680209844,
1590
+ "learning_rate": 2.8892557774167843e-05,
1591
+ "loss": 0.3023,
1592
+ "mean_token_accuracy": 0.9075566232204437,
1593
+ "step": 990
1594
+ },
1595
+ {
1596
+ "epoch": 2.1353383458646618,
1597
+ "grad_norm": 0.1730679539636109,
1598
+ "learning_rate": 2.8681775778589164e-05,
1599
+ "loss": 0.3031,
1600
+ "mean_token_accuracy": 0.9074501514434814,
1601
+ "step": 995
1602
+ },
1603
+ {
1604
+ "epoch": 2.146079484425349,
1605
+ "grad_norm": 0.168662599711155,
1606
+ "learning_rate": 2.8470889728539725e-05,
1607
+ "loss": 0.302,
1608
+ "mean_token_accuracy": 0.9077127814292908,
1609
+ "step": 1000
1610
+ },
1611
+ {
1612
+ "epoch": 2.1568206229860367,
1613
+ "grad_norm": 0.16226284047590997,
1614
+ "learning_rate": 2.8259918192378038e-05,
1615
+ "loss": 0.3041,
1616
+ "mean_token_accuracy": 0.9070930540561676,
1617
+ "step": 1005
1618
+ },
1619
+ {
1620
+ "epoch": 2.167561761546724,
1621
+ "grad_norm": 0.1576781128963043,
1622
+ "learning_rate": 2.804887974598959e-05,
1623
+ "loss": 0.3022,
1624
+ "mean_token_accuracy": 0.907502681016922,
1625
+ "step": 1010
1626
+ },
1627
+ {
1628
+ "epoch": 2.1783029001074112,
1629
+ "grad_norm": 0.15997962819428427,
1630
+ "learning_rate": 2.7837792971151268e-05,
1631
+ "loss": 0.3018,
1632
+ "mean_token_accuracy": 0.9079727530479431,
1633
+ "step": 1015
1634
+ },
1635
+ {
1636
+ "epoch": 2.189044038668099,
1637
+ "grad_norm": 0.16962861365112525,
1638
+ "learning_rate": 2.7626676453895238e-05,
1639
+ "loss": 0.3031,
1640
+ "mean_token_accuracy": 0.9071884095668793,
1641
+ "step": 1020
1642
+ },
1643
+ {
1644
+ "epoch": 2.199785177228786,
1645
+ "grad_norm": 0.16322576238996814,
1646
+ "learning_rate": 2.7415548782872468e-05,
1647
+ "loss": 0.3057,
1648
+ "mean_token_accuracy": 0.9065694689750672,
1649
+ "step": 1025
1650
+ },
1651
+ {
1652
+ "epoch": 2.2105263157894735,
1653
+ "grad_norm": 0.16909277271966566,
1654
+ "learning_rate": 2.7204428547716027e-05,
1655
+ "loss": 0.3052,
1656
+ "mean_token_accuracy": 0.9069810092449189,
1657
+ "step": 1030
1658
+ },
1659
+ {
1660
+ "epoch": 2.221267454350161,
1661
+ "grad_norm": 0.16098166127750824,
1662
+ "learning_rate": 2.699333433740422e-05,
1663
+ "loss": 0.3034,
1664
+ "mean_token_accuracy": 0.907333254814148,
1665
+ "step": 1035
1666
+ },
1667
+ {
1668
+ "epoch": 2.2320085929108484,
1669
+ "grad_norm": 0.17075220096927826,
1670
+ "learning_rate": 2.678228473862391e-05,
1671
+ "loss": 0.3059,
1672
+ "mean_token_accuracy": 0.9066526055335998,
1673
+ "step": 1040
1674
+ },
1675
+ {
1676
+ "epoch": 2.242749731471536,
1677
+ "grad_norm": 0.16370207033646628,
1678
+ "learning_rate": 2.6571298334133947e-05,
1679
+ "loss": 0.3049,
1680
+ "mean_token_accuracy": 0.9068757057189941,
1681
+ "step": 1045
1682
+ },
1683
+ {
1684
+ "epoch": 2.2534908700322234,
1685
+ "grad_norm": 0.1611010495321633,
1686
+ "learning_rate": 2.6360393701128968e-05,
1687
+ "loss": 0.3058,
1688
+ "mean_token_accuracy": 0.9067712783813476,
1689
+ "step": 1050
1690
+ },
1691
+ {
1692
+ "epoch": 2.264232008592911,
1693
+ "grad_norm": 0.16970228504955862,
1694
+ "learning_rate": 2.614958940960369e-05,
1695
+ "loss": 0.3052,
1696
+ "mean_token_accuracy": 0.9068210601806641,
1697
+ "step": 1055
1698
+ },
1699
+ {
1700
+ "epoch": 2.2749731471535983,
1701
+ "grad_norm": 0.1677663409783765,
1702
+ "learning_rate": 2.593890402071784e-05,
1703
+ "loss": 0.303,
1704
+ "mean_token_accuracy": 0.9071888148784637,
1705
+ "step": 1060
1706
+ },
1707
+ {
1708
+ "epoch": 2.2857142857142856,
1709
+ "grad_norm": 0.1594126722501793,
1710
+ "learning_rate": 2.5728356085161864e-05,
1711
+ "loss": 0.2979,
1712
+ "mean_token_accuracy": 0.9088397026062012,
1713
+ "step": 1065
1714
+ },
1715
+ {
1716
+ "epoch": 2.2964554242749733,
1717
+ "grad_norm": 0.15755295908932457,
1718
+ "learning_rate": 2.5517964141523525e-05,
1719
+ "loss": 0.3009,
1720
+ "mean_token_accuracy": 0.9078912615776062,
1721
+ "step": 1070
1722
+ },
1723
+ {
1724
+ "epoch": 2.3071965628356605,
1725
+ "grad_norm": 0.15824119025266686,
1726
+ "learning_rate": 2.5307746714655634e-05,
1727
+ "loss": 0.3065,
1728
+ "mean_token_accuracy": 0.9067668735980987,
1729
+ "step": 1075
1730
+ },
1731
+ {
1732
+ "epoch": 2.317937701396348,
1733
+ "grad_norm": 0.1593424773763769,
1734
+ "learning_rate": 2.509772231404493e-05,
1735
+ "loss": 0.3072,
1736
+ "mean_token_accuracy": 0.9063262104988098,
1737
+ "step": 1080
1738
+ },
1739
+ {
1740
+ "epoch": 2.3286788399570355,
1741
+ "grad_norm": 0.16745585583895234,
1742
+ "learning_rate": 2.4887909432182316e-05,
1743
+ "loss": 0.3205,
1744
+ "mean_token_accuracy": 0.9050490736961365,
1745
+ "step": 1085
1746
+ },
1747
+ {
1748
+ "epoch": 2.3394199785177228,
1749
+ "grad_norm": 0.18108073198198416,
1750
+ "learning_rate": 2.4678326542934667e-05,
1751
+ "loss": 0.3048,
1752
+ "mean_token_accuracy": 0.9068881213665009,
1753
+ "step": 1090
1754
+ },
1755
+ {
1756
+ "epoch": 2.3501611170784105,
1757
+ "grad_norm": 0.17241262713318053,
1758
+ "learning_rate": 2.4468992099918138e-05,
1759
+ "loss": 0.3032,
1760
+ "mean_token_accuracy": 0.9073716223239898,
1761
+ "step": 1095
1762
+ },
1763
+ {
1764
+ "epoch": 2.3609022556390977,
1765
+ "grad_norm": 0.16397300617763141,
1766
+ "learning_rate": 2.4259924534873385e-05,
1767
+ "loss": 0.3061,
1768
+ "mean_token_accuracy": 0.9062675356864929,
1769
+ "step": 1100
1770
+ },
1771
+ {
1772
+ "epoch": 2.3716433941997854,
1773
+ "grad_norm": 0.1700811614554712,
1774
+ "learning_rate": 2.4051142256042697e-05,
1775
+ "loss": 0.3011,
1776
+ "mean_token_accuracy": 0.90796759724617,
1777
+ "step": 1105
1778
+ },
1779
+ {
1780
+ "epoch": 2.3823845327604727,
1781
+ "grad_norm": 0.16924471517889025,
1782
+ "learning_rate": 2.3842663646549085e-05,
1783
+ "loss": 0.3025,
1784
+ "mean_token_accuracy": 0.9076179921627044,
1785
+ "step": 1110
1786
+ },
1787
+ {
1788
+ "epoch": 2.39312567132116,
1789
+ "grad_norm": 0.582746886765867,
1790
+ "learning_rate": 2.3634507062777726e-05,
1791
+ "loss": 0.3036,
1792
+ "mean_token_accuracy": 0.9076011419296265,
1793
+ "step": 1115
1794
+ },
1795
+ {
1796
+ "epoch": 2.4038668098818476,
1797
+ "grad_norm": 0.15789580559295846,
1798
+ "learning_rate": 2.3426690832759652e-05,
1799
+ "loss": 0.2997,
1800
+ "mean_token_accuracy": 0.9084276914596557,
1801
+ "step": 1120
1802
+ },
1803
+ {
1804
+ "epoch": 2.414607948442535,
1805
+ "grad_norm": 0.15924353242995867,
1806
+ "learning_rate": 2.3219233254558025e-05,
1807
+ "loss": 0.3029,
1808
+ "mean_token_accuracy": 0.9074055433273316,
1809
+ "step": 1125
1810
+ },
1811
+ {
1812
+ "epoch": 2.425349087003222,
1813
+ "grad_norm": 0.16646800963930639,
1814
+ "learning_rate": 2.3012152594656982e-05,
1815
+ "loss": 0.3043,
1816
+ "mean_token_accuracy": 0.9070705771446228,
1817
+ "step": 1130
1818
+ },
1819
+ {
1820
+ "epoch": 2.43609022556391,
1821
+ "grad_norm": 0.16197886055551655,
1822
+ "learning_rate": 2.2805467086353268e-05,
1823
+ "loss": 0.2983,
1824
+ "mean_token_accuracy": 0.9087878286838531,
1825
+ "step": 1135
1826
+ },
1827
+ {
1828
+ "epoch": 2.446831364124597,
1829
+ "grad_norm": 0.16381004501438137,
1830
+ "learning_rate": 2.2599194928150842e-05,
1831
+ "loss": 0.3037,
1832
+ "mean_token_accuracy": 0.9073452115058899,
1833
+ "step": 1140
1834
+ },
1835
+ {
1836
+ "epoch": 2.457572502685285,
1837
+ "grad_norm": 0.16540282102993875,
1838
+ "learning_rate": 2.239335428215849e-05,
1839
+ "loss": 0.3042,
1840
+ "mean_token_accuracy": 0.9071446895599365,
1841
+ "step": 1145
1842
+ },
1843
+ {
1844
+ "epoch": 2.468313641245972,
1845
+ "grad_norm": 0.16037824203377551,
1846
+ "learning_rate": 2.2187963272490676e-05,
1847
+ "loss": 0.3022,
1848
+ "mean_token_accuracy": 0.9079298913478852,
1849
+ "step": 1150
1850
+ },
1851
+ {
1852
+ "epoch": 2.4790547798066593,
1853
+ "grad_norm": 0.15882572997154093,
1854
+ "learning_rate": 2.198303998367171e-05,
1855
+ "loss": 0.3067,
1856
+ "mean_token_accuracy": 0.9064932882785797,
1857
+ "step": 1155
1858
+ },
1859
+ {
1860
+ "epoch": 2.489795918367347,
1861
+ "grad_norm": 0.15831447424850761,
1862
+ "learning_rate": 2.1778602459043452e-05,
1863
+ "loss": 0.3039,
1864
+ "mean_token_accuracy": 0.9070046961307525,
1865
+ "step": 1160
1866
+ },
1867
+ {
1868
+ "epoch": 2.5005370569280343,
1869
+ "grad_norm": 0.16081532493077333,
1870
+ "learning_rate": 2.157466869917658e-05,
1871
+ "loss": 0.3041,
1872
+ "mean_token_accuracy": 0.9073209702968598,
1873
+ "step": 1165
1874
+ },
1875
+ {
1876
+ "epoch": 2.511278195488722,
1877
+ "grad_norm": 0.15516248272553126,
1878
+ "learning_rate": 2.1371256660285655e-05,
1879
+ "loss": 0.3044,
1880
+ "mean_token_accuracy": 0.9070526838302613,
1881
+ "step": 1170
1882
+ },
1883
+ {
1884
+ "epoch": 2.5220193340494093,
1885
+ "grad_norm": 0.1587382733948704,
1886
+ "learning_rate": 2.1168384252648117e-05,
1887
+ "loss": 0.2999,
1888
+ "mean_token_accuracy": 0.9086295425891876,
1889
+ "step": 1175
1890
+ },
1891
+ {
1892
+ "epoch": 2.5327604726100965,
1893
+ "grad_norm": 0.15919430172381277,
1894
+ "learning_rate": 2.0966069339027256e-05,
1895
+ "loss": 0.3017,
1896
+ "mean_token_accuracy": 0.9076282560825348,
1897
+ "step": 1180
1898
+ },
1899
+ {
1900
+ "epoch": 2.543501611170784,
1901
+ "grad_norm": 0.1602383119084914,
1902
+ "learning_rate": 2.0764329733099446e-05,
1903
+ "loss": 0.2998,
1904
+ "mean_token_accuracy": 0.9084926426410675,
1905
+ "step": 1185
1906
+ },
1907
+ {
1908
+ "epoch": 2.5542427497314715,
1909
+ "grad_norm": 0.16156220155082493,
1910
+ "learning_rate": 2.0563183197885653e-05,
1911
+ "loss": 0.3068,
1912
+ "mean_token_accuracy": 0.9063272118568421,
1913
+ "step": 1190
1914
+ },
1915
+ {
1916
+ "epoch": 2.5649838882921587,
1917
+ "grad_norm": 0.15676424327787444,
1918
+ "learning_rate": 2.03626474441874e-05,
1919
+ "loss": 0.304,
1920
+ "mean_token_accuracy": 0.9073390066623688,
1921
+ "step": 1195
1922
+ },
1923
+ {
1924
+ "epoch": 2.5757250268528464,
1925
+ "grad_norm": 0.16064943066993936,
1926
+ "learning_rate": 2.016274012902737e-05,
1927
+ "loss": 0.3031,
1928
+ "mean_token_accuracy": 0.9080215394496918,
1929
+ "step": 1200
1930
+ },
1931
+ {
1932
+ "epoch": 2.5864661654135337,
1933
+ "grad_norm": 0.15163324815906554,
1934
+ "learning_rate": 1.996347885409468e-05,
1935
+ "loss": 0.2995,
1936
+ "mean_token_accuracy": 0.9081439912319184,
1937
+ "step": 1205
1938
+ },
1939
+ {
1940
+ "epoch": 2.5972073039742214,
1941
+ "grad_norm": 0.16245754277077917,
1942
+ "learning_rate": 1.9764881164195113e-05,
1943
+ "loss": 0.3015,
1944
+ "mean_token_accuracy": 0.907852166891098,
1945
+ "step": 1210
1946
+ },
1947
+ {
1948
+ "epoch": 2.6079484425349087,
1949
+ "grad_norm": 0.16043196872565563,
1950
+ "learning_rate": 1.956696454570629e-05,
1951
+ "loss": 0.3038,
1952
+ "mean_token_accuracy": 0.9070708453655243,
1953
+ "step": 1215
1954
+ },
1955
+ {
1956
+ "epoch": 2.6186895810955964,
1957
+ "grad_norm": 0.1518503511295408,
1958
+ "learning_rate": 1.9369746425037983e-05,
1959
+ "loss": 0.3031,
1960
+ "mean_token_accuracy": 0.9073640763759613,
1961
+ "step": 1220
1962
+ },
1963
+ {
1964
+ "epoch": 2.6294307196562836,
1965
+ "grad_norm": 0.16579054364092405,
1966
+ "learning_rate": 1.9173244167097766e-05,
1967
+ "loss": 0.3021,
1968
+ "mean_token_accuracy": 0.9075863361358643,
1969
+ "step": 1225
1970
+ },
1971
+ {
1972
+ "epoch": 2.640171858216971,
1973
+ "grad_norm": 0.16096483480946194,
1974
+ "learning_rate": 1.8977475073762042e-05,
1975
+ "loss": 0.3024,
1976
+ "mean_token_accuracy": 0.907714718580246,
1977
+ "step": 1230
1978
+ },
1979
+ {
1980
+ "epoch": 2.6509129967776586,
1981
+ "grad_norm": 0.16586554619371632,
1982
+ "learning_rate": 1.878245638235262e-05,
1983
+ "loss": 0.3032,
1984
+ "mean_token_accuracy": 0.9077441573143006,
1985
+ "step": 1235
1986
+ },
1987
+ {
1988
+ "epoch": 2.661654135338346,
1989
+ "grad_norm": 0.17145727431540336,
1990
+ "learning_rate": 1.8588205264118974e-05,
1991
+ "loss": 0.3007,
1992
+ "mean_token_accuracy": 0.9080956459045411,
1993
+ "step": 1240
1994
+ },
1995
+ {
1996
+ "epoch": 2.672395273899033,
1997
+ "grad_norm": 0.16247484247551466,
1998
+ "learning_rate": 1.8394738822726337e-05,
1999
+ "loss": 0.3078,
2000
+ "mean_token_accuracy": 0.9063467800617218,
2001
+ "step": 1245
2002
+ },
2003
+ {
2004
+ "epoch": 2.683136412459721,
2005
+ "grad_norm": 0.16303109945042918,
2006
+ "learning_rate": 1.8202074092749754e-05,
2007
+ "loss": 0.305,
2008
+ "mean_token_accuracy": 0.9077015459537506,
2009
+ "step": 1250
2010
+ },
2011
+ {
2012
+ "epoch": 2.693877551020408,
2013
+ "grad_norm": 0.15810829618004768,
2014
+ "learning_rate": 1.8010228038174154e-05,
2015
+ "loss": 0.3052,
2016
+ "mean_token_accuracy": 0.9069934606552124,
2017
+ "step": 1255
2018
+ },
2019
+ {
2020
+ "epoch": 2.7046186895810957,
2021
+ "grad_norm": 0.1572557171403785,
2022
+ "learning_rate": 1.781921755090072e-05,
2023
+ "loss": 0.3029,
2024
+ "mean_token_accuracy": 0.9075438380241394,
2025
+ "step": 1260
2026
+ },
2027
+ {
2028
+ "epoch": 2.715359828141783,
2029
+ "grad_norm": 0.15752257331645983,
2030
+ "learning_rate": 1.7629059449259565e-05,
2031
+ "loss": 0.2978,
2032
+ "mean_token_accuracy": 0.9092587411403656,
2033
+ "step": 1265
2034
+ },
2035
+ {
2036
+ "epoch": 2.7261009667024707,
2037
+ "grad_norm": 0.155952159894427,
2038
+ "learning_rate": 1.7439770476528894e-05,
2039
+ "loss": 0.3025,
2040
+ "mean_token_accuracy": 0.9076742231845856,
2041
+ "step": 1270
2042
+ },
2043
+ {
2044
+ "epoch": 2.736842105263158,
2045
+ "grad_norm": 0.1578844927904049,
2046
+ "learning_rate": 1.7251367299460735e-05,
2047
+ "loss": 0.3043,
2048
+ "mean_token_accuracy": 0.9071321785449982,
2049
+ "step": 1275
2050
+ },
2051
+ {
2052
+ "epoch": 2.7475832438238452,
2053
+ "grad_norm": 0.15643506287974016,
2054
+ "learning_rate": 1.7063866506813515e-05,
2055
+ "loss": 0.3014,
2056
+ "mean_token_accuracy": 0.9080881893634796,
2057
+ "step": 1280
2058
+ },
2059
+ {
2060
+ "epoch": 2.758324382384533,
2061
+ "grad_norm": 0.16188588270959753,
2062
+ "learning_rate": 1.687728460789136e-05,
2063
+ "loss": 0.3029,
2064
+ "mean_token_accuracy": 0.9077995300292969,
2065
+ "step": 1285
2066
+ },
2067
+ {
2068
+ "epoch": 2.76906552094522,
2069
+ "grad_norm": 0.15914290923730717,
2070
+ "learning_rate": 1.669163803109049e-05,
2071
+ "loss": 0.3039,
2072
+ "mean_token_accuracy": 0.9069546043872834,
2073
+ "step": 1290
2074
+ },
2075
+ {
2076
+ "epoch": 2.7798066595059074,
2077
+ "grad_norm": 0.1531939594797534,
2078
+ "learning_rate": 1.650694312245272e-05,
2079
+ "loss": 0.301,
2080
+ "mean_token_accuracy": 0.9082088112831116,
2081
+ "step": 1295
2082
+ },
2083
+ {
2084
+ "epoch": 2.790547798066595,
2085
+ "grad_norm": 0.14781879067353518,
2086
+ "learning_rate": 1.6323216144226218e-05,
2087
+ "loss": 0.3006,
2088
+ "mean_token_accuracy": 0.9082107961177825,
2089
+ "step": 1300
2090
+ },
2091
+ {
2092
+ "epoch": 2.8012889366272824,
2093
+ "grad_norm": 0.15796491533044651,
2094
+ "learning_rate": 1.614047327343358e-05,
2095
+ "loss": 0.3037,
2096
+ "mean_token_accuracy": 0.9073608994483948,
2097
+ "step": 1305
2098
+ },
2099
+ {
2100
+ "epoch": 2.8120300751879697,
2101
+ "grad_norm": 0.15342589995319128,
2102
+ "learning_rate": 1.5958730600447483e-05,
2103
+ "loss": 0.2982,
2104
+ "mean_token_accuracy": 0.9089851617813111,
2105
+ "step": 1310
2106
+ },
2107
+ {
2108
+ "epoch": 2.8227712137486574,
2109
+ "grad_norm": 0.15213716012041018,
2110
+ "learning_rate": 1.5778004127573954e-05,
2111
+ "loss": 0.3018,
2112
+ "mean_token_accuracy": 0.9082035005092621,
2113
+ "step": 1315
2114
+ },
2115
+ {
2116
+ "epoch": 2.833512352309345,
2117
+ "grad_norm": 0.15689344716817114,
2118
+ "learning_rate": 1.5598309767643355e-05,
2119
+ "loss": 0.3015,
2120
+ "mean_token_accuracy": 0.9079676389694213,
2121
+ "step": 1320
2122
+ },
2123
+ {
2124
+ "epoch": 2.8442534908700323,
2125
+ "grad_norm": 0.15560793520372218,
2126
+ "learning_rate": 1.5419663342609245e-05,
2127
+ "loss": 0.301,
2128
+ "mean_token_accuracy": 0.9079644203186035,
2129
+ "step": 1325
2130
+ },
2131
+ {
2132
+ "epoch": 2.8549946294307196,
2133
+ "grad_norm": 0.15762229912652725,
2134
+ "learning_rate": 1.524208058215536e-05,
2135
+ "loss": 0.3004,
2136
+ "mean_token_accuracy": 0.9081010043621063,
2137
+ "step": 1330
2138
+ },
2139
+ {
2140
+ "epoch": 2.8657357679914073,
2141
+ "grad_norm": 0.1492296564674764,
2142
+ "learning_rate": 1.5065577122310532e-05,
2143
+ "loss": 0.3038,
2144
+ "mean_token_accuracy": 0.9071996510028839,
2145
+ "step": 1335
2146
+ },
2147
+ {
2148
+ "epoch": 2.8764769065520945,
2149
+ "grad_norm": 0.15341782949091415,
2150
+ "learning_rate": 1.4890168504071986e-05,
2151
+ "loss": 0.3013,
2152
+ "mean_token_accuracy": 0.9081071972846985,
2153
+ "step": 1340
2154
+ },
2155
+ {
2156
+ "epoch": 2.887218045112782,
2157
+ "grad_norm": 0.15319646472290932,
2158
+ "learning_rate": 1.4715870172036961e-05,
2159
+ "loss": 0.2985,
2160
+ "mean_token_accuracy": 0.9089631140232086,
2161
+ "step": 1345
2162
+ },
2163
+ {
2164
+ "epoch": 2.8979591836734695,
2165
+ "grad_norm": 0.155104806503441,
2166
+ "learning_rate": 1.4542697473042855e-05,
2167
+ "loss": 0.3015,
2168
+ "mean_token_accuracy": 0.9081062614917755,
2169
+ "step": 1350
2170
+ },
2171
+ {
2172
+ "epoch": 2.9087003222341568,
2173
+ "grad_norm": 0.14997293337059112,
2174
+ "learning_rate": 1.4370665654815896e-05,
2175
+ "loss": 0.3016,
2176
+ "mean_token_accuracy": 0.9077993631362915,
2177
+ "step": 1355
2178
+ },
2179
+ {
2180
+ "epoch": 2.919441460794844,
2181
+ "grad_norm": 0.15836235770159765,
2182
+ "learning_rate": 1.4199789864628612e-05,
2183
+ "loss": 0.3025,
2184
+ "mean_token_accuracy": 0.9076350510120392,
2185
+ "step": 1360
2186
+ },
2187
+ {
2188
+ "epoch": 2.9301825993555317,
2189
+ "grad_norm": 0.15239559171871817,
2190
+ "learning_rate": 1.403008514796616e-05,
2191
+ "loss": 0.3002,
2192
+ "mean_token_accuracy": 0.9083379149436951,
2193
+ "step": 1365
2194
+ },
2195
+ {
2196
+ "epoch": 2.940923737916219,
2197
+ "grad_norm": 0.15596273472793287,
2198
+ "learning_rate": 1.3861566447201524e-05,
2199
+ "loss": 0.2989,
2200
+ "mean_token_accuracy": 0.9084150791168213,
2201
+ "step": 1370
2202
+ },
2203
+ {
2204
+ "epoch": 2.9516648764769067,
2205
+ "grad_norm": 0.15225411451673648,
2206
+ "learning_rate": 1.3694248600279886e-05,
2207
+ "loss": 0.3002,
2208
+ "mean_token_accuracy": 0.9083608329296112,
2209
+ "step": 1375
2210
+ },
2211
+ {
2212
+ "epoch": 2.962406015037594,
2213
+ "grad_norm": 0.15301962057571455,
2214
+ "learning_rate": 1.3528146339412146e-05,
2215
+ "loss": 0.3021,
2216
+ "mean_token_accuracy": 0.9078640341758728,
2217
+ "step": 1380
2218
+ },
2219
+ {
2220
+ "epoch": 2.9731471535982816,
2221
+ "grad_norm": 0.15353042988029672,
2222
+ "learning_rate": 1.3363274289777773e-05,
2223
+ "loss": 0.2992,
2224
+ "mean_token_accuracy": 0.9084159135818481,
2225
+ "step": 1385
2226
+ },
2227
+ {
2228
+ "epoch": 2.983888292158969,
2229
+ "grad_norm": 0.1565397591962354,
2230
+ "learning_rate": 1.3199646968237039e-05,
2231
+ "loss": 0.3019,
2232
+ "mean_token_accuracy": 0.9077640831470489,
2233
+ "step": 1390
2234
+ },
2235
+ {
2236
+ "epoch": 2.994629430719656,
2237
+ "grad_norm": 0.15512948456888964,
2238
+ "learning_rate": 1.3037278782052863e-05,
2239
+ "loss": 0.301,
2240
+ "mean_token_accuracy": 0.908068060874939,
2241
+ "step": 1395
2242
+ },
2243
+ {
2244
+ "epoch": 3.004296455424275,
2245
+ "grad_norm": 0.17611687143689977,
2246
+ "learning_rate": 1.2876184027622246e-05,
2247
+ "loss": 0.2837,
2248
+ "mean_token_accuracy": 0.9126578701866997,
2249
+ "step": 1400
2250
+ },
2251
+ {
2252
+ "epoch": 3.0150375939849625,
2253
+ "grad_norm": 0.23111560237426948,
2254
+ "learning_rate": 1.2716376889217446e-05,
2255
+ "loss": 0.2617,
2256
+ "mean_token_accuracy": 0.9192156255245209,
2257
+ "step": 1405
2258
+ },
2259
+ {
2260
+ "epoch": 3.0257787325456498,
2261
+ "grad_norm": 0.18975174760198046,
2262
+ "learning_rate": 1.2557871437737118e-05,
2263
+ "loss": 0.2613,
2264
+ "mean_token_accuracy": 0.9190598428249359,
2265
+ "step": 1410
2266
+ },
2267
+ {
2268
+ "epoch": 3.0365198711063375,
2269
+ "grad_norm": 0.17890147872689252,
2270
+ "learning_rate": 1.240068162946737e-05,
2271
+ "loss": 0.2584,
2272
+ "mean_token_accuracy": 0.91984983086586,
2273
+ "step": 1415
2274
+ },
2275
+ {
2276
+ "epoch": 3.0472610096670247,
2277
+ "grad_norm": 0.17315801700410546,
2278
+ "learning_rate": 1.2244821304852888e-05,
2279
+ "loss": 0.2557,
2280
+ "mean_token_accuracy": 0.9208986639976502,
2281
+ "step": 1420
2282
+ },
2283
+ {
2284
+ "epoch": 3.058002148227712,
2285
+ "grad_norm": 0.18517285000872677,
2286
+ "learning_rate": 1.2090304187278333e-05,
2287
+ "loss": 0.2604,
2288
+ "mean_token_accuracy": 0.9195366144180298,
2289
+ "step": 1425
2290
+ },
2291
+ {
2292
+ "epoch": 3.0687432867883997,
2293
+ "grad_norm": 0.16562595080311196,
2294
+ "learning_rate": 1.1937143881859981e-05,
2295
+ "loss": 0.2577,
2296
+ "mean_token_accuracy": 0.9203976690769196,
2297
+ "step": 1430
2298
+ },
2299
+ {
2300
+ "epoch": 3.079484425349087,
2301
+ "grad_norm": 0.17393143558685065,
2302
+ "learning_rate": 1.178535387424785e-05,
2303
+ "loss": 0.2574,
2304
+ "mean_token_accuracy": 0.9199799060821533,
2305
+ "step": 1435
2306
+ },
2307
+ {
2308
+ "epoch": 3.090225563909774,
2309
+ "grad_norm": 0.1645998735975408,
2310
+ "learning_rate": 1.163494752943822e-05,
2311
+ "loss": 0.2568,
2312
+ "mean_token_accuracy": 0.9204827189445496,
2313
+ "step": 1440
2314
+ },
2315
+ {
2316
+ "epoch": 3.100966702470462,
2317
+ "grad_norm": 0.16887936249293273,
2318
+ "learning_rate": 1.1485938090596918e-05,
2319
+ "loss": 0.2586,
2320
+ "mean_token_accuracy": 0.9197791635990142,
2321
+ "step": 1445
2322
+ },
2323
+ {
2324
+ "epoch": 3.111707841031149,
2325
+ "grad_norm": 0.17416795475633623,
2326
+ "learning_rate": 1.1338338677893261e-05,
2327
+ "loss": 0.2584,
2328
+ "mean_token_accuracy": 0.9200873076915741,
2329
+ "step": 1450
2330
+ },
2331
+ {
2332
+ "epoch": 3.122448979591837,
2333
+ "grad_norm": 0.1751550798568952,
2334
+ "learning_rate": 1.1192162287344806e-05,
2335
+ "loss": 0.2584,
2336
+ "mean_token_accuracy": 0.919762271642685,
2337
+ "step": 1455
2338
+ },
2339
+ {
2340
+ "epoch": 3.133190118152524,
2341
+ "grad_norm": 0.17592907174451083,
2342
+ "learning_rate": 1.1047421789673082e-05,
2343
+ "loss": 0.2597,
2344
+ "mean_token_accuracy": 0.9195389747619629,
2345
+ "step": 1460
2346
+ },
2347
+ {
2348
+ "epoch": 3.143931256713212,
2349
+ "grad_norm": 0.17327426676281532,
2350
+ "learning_rate": 1.0904129929170317e-05,
2351
+ "loss": 0.2556,
2352
+ "mean_token_accuracy": 0.9207349836826324,
2353
+ "step": 1465
2354
+ },
2355
+ {
2356
+ "epoch": 3.154672395273899,
2357
+ "grad_norm": 0.17320030271762202,
2358
+ "learning_rate": 1.0762299322577352e-05,
2359
+ "loss": 0.2573,
2360
+ "mean_token_accuracy": 0.9203036367893219,
2361
+ "step": 1470
2362
+ },
2363
+ {
2364
+ "epoch": 3.1654135338345863,
2365
+ "grad_norm": 0.1722311431748818,
2366
+ "learning_rate": 1.0621942457972692e-05,
2367
+ "loss": 0.26,
2368
+ "mean_token_accuracy": 0.9195259928703308,
2369
+ "step": 1475
2370
+ },
2371
+ {
2372
+ "epoch": 3.176154672395274,
2373
+ "grad_norm": 0.17238717747260024,
2374
+ "learning_rate": 1.0483071693672959e-05,
2375
+ "loss": 0.2556,
2376
+ "mean_token_accuracy": 0.9209478557109833,
2377
+ "step": 1480
2378
+ },
2379
+ {
2380
+ "epoch": 3.1868958109559613,
2381
+ "grad_norm": 0.17188960001484813,
2382
+ "learning_rate": 1.0345699257144787e-05,
2383
+ "loss": 0.2599,
2384
+ "mean_token_accuracy": 0.9196560025215149,
2385
+ "step": 1485
2386
+ },
2387
+ {
2388
+ "epoch": 3.1976369495166486,
2389
+ "grad_norm": 0.16939046145995434,
2390
+ "learning_rate": 1.0209837243928163e-05,
2391
+ "loss": 0.2569,
2392
+ "mean_token_accuracy": 0.9202696919441223,
2393
+ "step": 1490
2394
+ },
2395
+ {
2396
+ "epoch": 3.2083780880773363,
2397
+ "grad_norm": 0.1643698296522669,
2398
+ "learning_rate": 1.0075497616571402e-05,
2399
+ "loss": 0.2613,
2400
+ "mean_token_accuracy": 0.9193197846412658,
2401
+ "step": 1495
2402
+ },
2403
+ {
2404
+ "epoch": 3.2191192266380235,
2405
+ "grad_norm": 0.17523553700537306,
2406
+ "learning_rate": 9.942692203577937e-06,
2407
+ "loss": 0.2617,
2408
+ "mean_token_accuracy": 0.9192265450954438,
2409
+ "step": 1500
2410
+ },
2411
+ {
2412
+ "epoch": 3.2298603651987112,
2413
+ "grad_norm": 0.17674127090736955,
2414
+ "learning_rate": 9.811432698364748e-06,
2415
+ "loss": 0.2611,
2416
+ "mean_token_accuracy": 0.9191824972629548,
2417
+ "step": 1505
2418
+ },
2419
+ {
2420
+ "epoch": 3.2406015037593985,
2421
+ "grad_norm": 0.17789280108349984,
2422
+ "learning_rate": 9.681730658232796e-06,
2423
+ "loss": 0.2631,
2424
+ "mean_token_accuracy": 0.9186322450637817,
2425
+ "step": 1510
2426
+ },
2427
+ {
2428
+ "epoch": 3.2513426423200857,
2429
+ "grad_norm": 0.17266428476273013,
2430
+ "learning_rate": 9.553597503349415e-06,
2431
+ "loss": 0.2582,
2432
+ "mean_token_accuracy": 0.9197676658630372,
2433
+ "step": 1515
2434
+ },
2435
+ {
2436
+ "epoch": 3.2620837808807734,
2437
+ "grad_norm": 0.1756023449894313,
2438
+ "learning_rate": 9.427044515742773e-06,
2439
+ "loss": 0.2583,
2440
+ "mean_token_accuracy": 0.9203043103218078,
2441
+ "step": 1520
2442
+ },
2443
+ {
2444
+ "epoch": 3.2728249194414607,
2445
+ "grad_norm": 0.1705185261901335,
2446
+ "learning_rate": 9.302082838308494e-06,
2447
+ "loss": 0.2588,
2448
+ "mean_token_accuracy": 0.9197465479373932,
2449
+ "step": 1525
2450
+ },
2451
+ {
2452
+ "epoch": 3.2835660580021484,
2453
+ "grad_norm": 0.1863220207081355,
2454
+ "learning_rate": 9.178723473828517e-06,
2455
+ "loss": 0.2592,
2456
+ "mean_token_accuracy": 0.919755893945694,
2457
+ "step": 1530
2458
+ },
2459
+ {
2460
+ "epoch": 3.2943071965628357,
2461
+ "grad_norm": 0.18144578655920904,
2462
+ "learning_rate": 9.05697728400236e-06,
2463
+ "loss": 0.2588,
2464
+ "mean_token_accuracy": 0.9201307475566864,
2465
+ "step": 1535
2466
+ },
2467
+ {
2468
+ "epoch": 3.305048335123523,
2469
+ "grad_norm": 0.17313846247861978,
2470
+ "learning_rate": 8.936854988490695e-06,
2471
+ "loss": 0.2627,
2472
+ "mean_token_accuracy": 0.9188291728496552,
2473
+ "step": 1540
2474
+ },
2475
+ {
2476
+ "epoch": 3.3157894736842106,
2477
+ "grad_norm": 0.1801914802446693,
2478
+ "learning_rate": 8.818367163971535e-06,
2479
+ "loss": 0.2557,
2480
+ "mean_token_accuracy": 0.9207710027694702,
2481
+ "step": 1545
2482
+ },
2483
+ {
2484
+ "epoch": 3.326530612244898,
2485
+ "grad_norm": 0.16994847146506772,
2486
+ "learning_rate": 8.701524243208935e-06,
2487
+ "loss": 0.2598,
2488
+ "mean_token_accuracy": 0.9194996774196624,
2489
+ "step": 1550
2490
+ },
2491
+ {
2492
+ "epoch": 3.3372717508055856,
2493
+ "grad_norm": 0.16955583517854705,
2494
+ "learning_rate": 8.586336514134416e-06,
2495
+ "loss": 0.2566,
2496
+ "mean_token_accuracy": 0.9205721557140351,
2497
+ "step": 1555
2498
+ },
2499
+ {
2500
+ "epoch": 3.348012889366273,
2501
+ "grad_norm": 0.17107585176009693,
2502
+ "learning_rate": 8.472814118941111e-06,
2503
+ "loss": 0.2594,
2504
+ "mean_token_accuracy": 0.9197823405265808,
2505
+ "step": 1560
2506
+ },
2507
+ {
2508
+ "epoch": 3.35875402792696,
2509
+ "grad_norm": 0.17753792836827956,
2510
+ "learning_rate": 8.360967053190748e-06,
2511
+ "loss": 0.2595,
2512
+ "mean_token_accuracy": 0.9195821940898895,
2513
+ "step": 1565
2514
+ },
2515
+ {
2516
+ "epoch": 3.369495166487648,
2517
+ "grad_norm": 0.1663276449550015,
2518
+ "learning_rate": 8.250805164933576e-06,
2519
+ "loss": 0.2576,
2520
+ "mean_token_accuracy": 0.9204757869243622,
2521
+ "step": 1570
2522
+ },
2523
+ {
2524
+ "epoch": 3.380236305048335,
2525
+ "grad_norm": 0.1727926922684143,
2526
+ "learning_rate": 8.142338153841204e-06,
2527
+ "loss": 0.2613,
2528
+ "mean_token_accuracy": 0.9192953467369079,
2529
+ "step": 1575
2530
+ },
2531
+ {
2532
+ "epoch": 3.3909774436090228,
2533
+ "grad_norm": 0.16245992891648223,
2534
+ "learning_rate": 8.035575570352586e-06,
2535
+ "loss": 0.2603,
2536
+ "mean_token_accuracy": 0.9196378767490387,
2537
+ "step": 1580
2538
+ },
2539
+ {
2540
+ "epoch": 3.40171858216971,
2541
+ "grad_norm": 0.1728382431801045,
2542
+ "learning_rate": 7.930526814833114e-06,
2543
+ "loss": 0.2642,
2544
+ "mean_token_accuracy": 0.9182481050491333,
2545
+ "step": 1585
2546
+ },
2547
+ {
2548
+ "epoch": 3.4124597207303973,
2549
+ "grad_norm": 0.17059237401574356,
2550
+ "learning_rate": 7.827201136746903e-06,
2551
+ "loss": 0.2608,
2552
+ "mean_token_accuracy": 0.9196362137794495,
2553
+ "step": 1590
2554
+ },
2555
+ {
2556
+ "epoch": 3.423200859291085,
2557
+ "grad_norm": 0.17006814998266018,
2558
+ "learning_rate": 7.725607633842397e-06,
2559
+ "loss": 0.262,
2560
+ "mean_token_accuracy": 0.9188037991523743,
2561
+ "step": 1595
2562
+ },
2563
+ {
2564
+ "epoch": 3.4339419978517722,
2565
+ "grad_norm": 0.17763939677962118,
2566
+ "learning_rate": 7.625755251351302e-06,
2567
+ "loss": 0.2571,
2568
+ "mean_token_accuracy": 0.92064950466156,
2569
+ "step": 1600
2570
+ },
2571
+ {
2572
+ "epoch": 3.4446831364124595,
2573
+ "grad_norm": 0.16880550111530884,
2574
+ "learning_rate": 7.52765278120101e-06,
2575
+ "loss": 0.2619,
2576
+ "mean_token_accuracy": 0.919091010093689,
2577
+ "step": 1605
2578
+ },
2579
+ {
2580
+ "epoch": 3.455424274973147,
2581
+ "grad_norm": 0.17470127038229266,
2582
+ "learning_rate": 7.431308861240405e-06,
2583
+ "loss": 0.2611,
2584
+ "mean_token_accuracy": 0.9194313704967498,
2585
+ "step": 1610
2586
+ },
2587
+ {
2588
+ "epoch": 3.4661654135338344,
2589
+ "grad_norm": 0.18361814009538877,
2590
+ "learning_rate": 7.336731974479366e-06,
2591
+ "loss": 0.2606,
2592
+ "mean_token_accuracy": 0.9194453060626984,
2593
+ "step": 1615
2594
+ },
2595
+ {
2596
+ "epoch": 3.476906552094522,
2597
+ "grad_norm": 0.16896194278522544,
2598
+ "learning_rate": 7.2439304483418275e-06,
2599
+ "loss": 0.2567,
2600
+ "mean_token_accuracy": 0.9206092417240143,
2601
+ "step": 1620
2602
+ },
2603
+ {
2604
+ "epoch": 3.4876476906552094,
2605
+ "grad_norm": 0.16668518571688956,
2606
+ "learning_rate": 7.152912453932546e-06,
2607
+ "loss": 0.2595,
2608
+ "mean_token_accuracy": 0.9194850385189056,
2609
+ "step": 1625
2610
+ },
2611
+ {
2612
+ "epoch": 3.498388829215897,
2613
+ "grad_norm": 0.17386165770379072,
2614
+ "learning_rate": 7.063686005317651e-06,
2615
+ "loss": 0.2579,
2616
+ "mean_token_accuracy": 0.9201728105545044,
2617
+ "step": 1630
2618
+ },
2619
+ {
2620
+ "epoch": 3.5091299677765844,
2621
+ "grad_norm": 0.17090370338380814,
2622
+ "learning_rate": 6.976258958819e-06,
2623
+ "loss": 0.2583,
2624
+ "mean_token_accuracy": 0.9202900052070617,
2625
+ "step": 1635
2626
+ },
2627
+ {
2628
+ "epoch": 3.5198711063372716,
2629
+ "grad_norm": 0.1670190265056932,
2630
+ "learning_rate": 6.890639012322459e-06,
2631
+ "loss": 0.2547,
2632
+ "mean_token_accuracy": 0.9211665093898773,
2633
+ "step": 1640
2634
+ },
2635
+ {
2636
+ "epoch": 3.5306122448979593,
2637
+ "grad_norm": 0.17315381341418587,
2638
+ "learning_rate": 6.806833704600082e-06,
2639
+ "loss": 0.2561,
2640
+ "mean_token_accuracy": 0.9206245243549347,
2641
+ "step": 1645
2642
+ },
2643
+ {
2644
+ "epoch": 3.5413533834586466,
2645
+ "grad_norm": 0.17367639326439366,
2646
+ "learning_rate": 6.724850414646344e-06,
2647
+ "loss": 0.2554,
2648
+ "mean_token_accuracy": 0.9209690392017365,
2649
+ "step": 1650
2650
+ },
2651
+ {
2652
+ "epoch": 3.552094522019334,
2653
+ "grad_norm": 0.18356634723924625,
2654
+ "learning_rate": 6.644696361028427e-06,
2655
+ "loss": 0.2546,
2656
+ "mean_token_accuracy": 0.9211890578269959,
2657
+ "step": 1655
2658
+ },
2659
+ {
2660
+ "epoch": 3.5628356605800215,
2661
+ "grad_norm": 0.1686096868472299,
2662
+ "learning_rate": 6.566378601250625e-06,
2663
+ "loss": 0.258,
2664
+ "mean_token_accuracy": 0.9201010644435883,
2665
+ "step": 1660
2666
+ },
2667
+ {
2668
+ "epoch": 3.573576799140709,
2669
+ "grad_norm": 0.17097492830249045,
2670
+ "learning_rate": 6.489904031132919e-06,
2671
+ "loss": 0.2573,
2672
+ "mean_token_accuracy": 0.9203424453735352,
2673
+ "step": 1665
2674
+ },
2675
+ {
2676
+ "epoch": 3.5843179377013965,
2677
+ "grad_norm": 0.1708922574820426,
2678
+ "learning_rate": 6.415279384203853e-06,
2679
+ "loss": 0.2573,
2680
+ "mean_token_accuracy": 0.9202109038829803,
2681
+ "step": 1670
2682
+ },
2683
+ {
2684
+ "epoch": 3.5950590762620838,
2685
+ "grad_norm": 0.1772280034240442,
2686
+ "learning_rate": 6.3425112311075965e-06,
2687
+ "loss": 0.2563,
2688
+ "mean_token_accuracy": 0.9204185366630554,
2689
+ "step": 1675
2690
+ },
2691
+ {
2692
+ "epoch": 3.6058002148227715,
2693
+ "grad_norm": 0.17186880847864094,
2694
+ "learning_rate": 6.271605979025448e-06,
2695
+ "loss": 0.2555,
2696
+ "mean_token_accuracy": 0.9206036269664765,
2697
+ "step": 1680
2698
+ },
2699
+ {
2700
+ "epoch": 3.6165413533834587,
2701
+ "grad_norm": 0.16731807378864566,
2702
+ "learning_rate": 6.2025698711116535e-06,
2703
+ "loss": 0.2565,
2704
+ "mean_token_accuracy": 0.9205489337444306,
2705
+ "step": 1685
2706
+ },
2707
+ {
2708
+ "epoch": 3.627282491944146,
2709
+ "grad_norm": 0.17180713091530317,
2710
+ "learning_rate": 6.135408985943734e-06,
2711
+ "loss": 0.2573,
2712
+ "mean_token_accuracy": 0.9204003512859344,
2713
+ "step": 1690
2714
+ },
2715
+ {
2716
+ "epoch": 3.6380236305048337,
2717
+ "grad_norm": 0.1761977177776313,
2718
+ "learning_rate": 6.07012923698724e-06,
2719
+ "loss": 0.2587,
2720
+ "mean_token_accuracy": 0.9196424603462219,
2721
+ "step": 1695
2722
+ },
2723
+ {
2724
+ "epoch": 3.648764769065521,
2725
+ "grad_norm": 0.17221380858566646,
2726
+ "learning_rate": 6.006736372075093e-06,
2727
+ "loss": 0.2579,
2728
+ "mean_token_accuracy": 0.9200917899608612,
2729
+ "step": 1700
2730
+ },
2731
+ {
2732
+ "epoch": 3.659505907626208,
2733
+ "grad_norm": 0.16805608384415285,
2734
+ "learning_rate": 5.9452359729015004e-06,
2735
+ "loss": 0.2573,
2736
+ "mean_token_accuracy": 0.9203401625156402,
2737
+ "step": 1705
2738
+ },
2739
+ {
2740
+ "epoch": 3.670247046186896,
2741
+ "grad_norm": 0.1736765217184823,
2742
+ "learning_rate": 5.8856334545304676e-06,
2743
+ "loss": 0.2574,
2744
+ "mean_token_accuracy": 0.9203644514083862,
2745
+ "step": 1710
2746
+ },
2747
+ {
2748
+ "epoch": 3.680988184747583,
2749
+ "grad_norm": 0.1726788133620247,
2750
+ "learning_rate": 5.8279340649190244e-06,
2751
+ "loss": 0.2611,
2752
+ "mean_token_accuracy": 0.9194235980510712,
2753
+ "step": 1715
2754
+ },
2755
+ {
2756
+ "epoch": 3.6917293233082704,
2757
+ "grad_norm": 0.16707078529197217,
2758
+ "learning_rate": 5.7721428844551425e-06,
2759
+ "loss": 0.2611,
2760
+ "mean_token_accuracy": 0.9193582713603974,
2761
+ "step": 1720
2762
+ },
2763
+ {
2764
+ "epoch": 3.702470461868958,
2765
+ "grad_norm": 0.17182290992101512,
2766
+ "learning_rate": 5.7182648255104065e-06,
2767
+ "loss": 0.2596,
2768
+ "mean_token_accuracy": 0.9196705460548401,
2769
+ "step": 1725
2770
+ },
2771
+ {
2772
+ "epoch": 3.7132116004296454,
2773
+ "grad_norm": 0.17419790279430714,
2774
+ "learning_rate": 5.666304632007487e-06,
2775
+ "loss": 0.2595,
2776
+ "mean_token_accuracy": 0.9197326540946961,
2777
+ "step": 1730
2778
+ },
2779
+ {
2780
+ "epoch": 3.723952738990333,
2781
+ "grad_norm": 0.18041100180688655,
2782
+ "learning_rate": 5.616266879002444e-06,
2783
+ "loss": 0.2575,
2784
+ "mean_token_accuracy": 0.9202880382537841,
2785
+ "step": 1735
2786
+ },
2787
+ {
2788
+ "epoch": 3.7346938775510203,
2789
+ "grad_norm": 0.16636878690891047,
2790
+ "learning_rate": 5.568155972281892e-06,
2791
+ "loss": 0.2582,
2792
+ "mean_token_accuracy": 0.9199542105197906,
2793
+ "step": 1740
2794
+ },
2795
+ {
2796
+ "epoch": 3.745435016111708,
2797
+ "grad_norm": 0.17005943549418737,
2798
+ "learning_rate": 5.521976147975078e-06,
2799
+ "loss": 0.2575,
2800
+ "mean_token_accuracy": 0.9207047700881958,
2801
+ "step": 1745
2802
+ },
2803
+ {
2804
+ "epoch": 3.7561761546723953,
2805
+ "grad_norm": 0.17142683208534373,
2806
+ "learning_rate": 5.477731472180884e-06,
2807
+ "loss": 0.2578,
2808
+ "mean_token_accuracy": 0.9200609147548675,
2809
+ "step": 1750
2810
+ },
2811
+ {
2812
+ "epoch": 3.7669172932330826,
2813
+ "grad_norm": 0.19597039412044637,
2814
+ "learning_rate": 5.4354258406098275e-06,
2815
+ "loss": 0.2605,
2816
+ "mean_token_accuracy": 0.9196163058280945,
2817
+ "step": 1755
2818
+ },
2819
+ {
2820
+ "epoch": 3.7776584317937703,
2821
+ "grad_norm": 0.1891144335762954,
2822
+ "learning_rate": 5.395062978241028e-06,
2823
+ "loss": 0.256,
2824
+ "mean_token_accuracy": 0.9203970789909363,
2825
+ "step": 1760
2826
+ },
2827
+ {
2828
+ "epoch": 3.7883995703544575,
2829
+ "grad_norm": 0.1734382570098929,
2830
+ "learning_rate": 5.356646438994236e-06,
2831
+ "loss": 0.2562,
2832
+ "mean_token_accuracy": 0.9206745564937592,
2833
+ "step": 1765
2834
+ },
2835
+ {
2836
+ "epoch": 3.7991407089151448,
2837
+ "grad_norm": 0.167509733493585,
2838
+ "learning_rate": 5.3201796054169155e-06,
2839
+ "loss": 0.2587,
2840
+ "mean_token_accuracy": 0.919745409488678,
2841
+ "step": 1770
2842
+ },
2843
+ {
2844
+ "epoch": 3.8098818474758325,
2845
+ "grad_norm": 0.1758205628466223,
2846
+ "learning_rate": 5.285665688386408e-06,
2847
+ "loss": 0.2554,
2848
+ "mean_token_accuracy": 0.9208223819732666,
2849
+ "step": 1775
2850
+ },
2851
+ {
2852
+ "epoch": 3.8206229860365197,
2853
+ "grad_norm": 0.16934855068248722,
2854
+ "learning_rate": 5.253107726827213e-06,
2855
+ "loss": 0.2553,
2856
+ "mean_token_accuracy": 0.9208275616168976,
2857
+ "step": 1780
2858
+ },
2859
+ {
2860
+ "epoch": 3.8313641245972074,
2861
+ "grad_norm": 0.17212203700590173,
2862
+ "learning_rate": 5.222508587443419e-06,
2863
+ "loss": 0.2558,
2864
+ "mean_token_accuracy": 0.9208298087120056,
2865
+ "step": 1785
2866
+ },
2867
+ {
2868
+ "epoch": 3.8421052631578947,
2869
+ "grad_norm": 0.17351309384632746,
2870
+ "learning_rate": 5.193870964466299e-06,
2871
+ "loss": 0.2572,
2872
+ "mean_token_accuracy": 0.9206307530403137,
2873
+ "step": 1790
2874
+ },
2875
+ {
2876
+ "epoch": 3.8528464017185824,
2877
+ "grad_norm": 0.17423994454268188,
2878
+ "learning_rate": 5.167197379417072e-06,
2879
+ "loss": 0.2563,
2880
+ "mean_token_accuracy": 0.9204454243183136,
2881
+ "step": 1795
2882
+ },
2883
+ {
2884
+ "epoch": 3.8635875402792696,
2885
+ "grad_norm": 0.17091404042612268,
2886
+ "learning_rate": 5.142490180884889e-06,
2887
+ "loss": 0.2566,
2888
+ "mean_token_accuracy": 0.920625650882721,
2889
+ "step": 1800
2890
+ },
2891
+ {
2892
+ "epoch": 3.874328678839957,
2893
+ "grad_norm": 0.17402338382213903,
2894
+ "learning_rate": 5.119751544320045e-06,
2895
+ "loss": 0.2548,
2896
+ "mean_token_accuracy": 0.9212319254875183,
2897
+ "step": 1805
2898
+ },
2899
+ {
2900
+ "epoch": 3.8850698174006446,
2901
+ "grad_norm": 0.17785847377734187,
2902
+ "learning_rate": 5.098983471842435e-06,
2903
+ "loss": 0.2582,
2904
+ "mean_token_accuracy": 0.9204130828380584,
2905
+ "step": 1810
2906
+ },
2907
+ {
2908
+ "epoch": 3.895810955961332,
2909
+ "grad_norm": 0.17476387276762337,
2910
+ "learning_rate": 5.080187792065258e-06,
2911
+ "loss": 0.2576,
2912
+ "mean_token_accuracy": 0.9203925788402557,
2913
+ "step": 1815
2914
+ },
2915
+ {
2916
+ "epoch": 3.906552094522019,
2917
+ "grad_norm": 0.17401606856867693,
2918
+ "learning_rate": 5.063366159934019e-06,
2919
+ "loss": 0.257,
2920
+ "mean_token_accuracy": 0.9207073092460633,
2921
+ "step": 1820
2922
+ },
2923
+ {
2924
+ "epoch": 3.917293233082707,
2925
+ "grad_norm": 0.1709751716211779,
2926
+ "learning_rate": 5.04852005658081e-06,
2927
+ "loss": 0.2567,
2928
+ "mean_token_accuracy": 0.9206726491451264,
2929
+ "step": 1825
2930
+ },
2931
+ {
2932
+ "epoch": 3.928034371643394,
2933
+ "grad_norm": 0.17944667291264363,
2934
+ "learning_rate": 5.035650789193893e-06,
2935
+ "loss": 0.2583,
2936
+ "mean_token_accuracy": 0.919947350025177,
2937
+ "step": 1830
2938
+ },
2939
+ {
2940
+ "epoch": 3.938775510204082,
2941
+ "grad_norm": 0.17075839857976619,
2942
+ "learning_rate": 5.024759490902604e-06,
2943
+ "loss": 0.2606,
2944
+ "mean_token_accuracy": 0.9192731857299805,
2945
+ "step": 1835
2946
+ },
2947
+ {
2948
+ "epoch": 3.949516648764769,
2949
+ "grad_norm": 0.1725574446830871,
2950
+ "learning_rate": 5.015847120677588e-06,
2951
+ "loss": 0.2585,
2952
+ "mean_token_accuracy": 0.9199050843715668,
2953
+ "step": 1840
2954
+ },
2955
+ {
2956
+ "epoch": 3.9602577873254567,
2957
+ "grad_norm": 0.17546758649223276,
2958
+ "learning_rate": 5.008914463246362e-06,
2959
+ "loss": 0.2586,
2960
+ "mean_token_accuracy": 0.920122253894806,
2961
+ "step": 1845
2962
+ },
2963
+ {
2964
+ "epoch": 3.970998925886144,
2965
+ "grad_norm": 0.16820021081330186,
2966
+ "learning_rate": 5.0039621290242065e-06,
2967
+ "loss": 0.2583,
2968
+ "mean_token_accuracy": 0.9200729191303253,
2969
+ "step": 1850
2970
+ },
2971
+ {
2972
+ "epoch": 3.9817400644468313,
2973
+ "grad_norm": 0.17517771341096255,
2974
+ "learning_rate": 5.000990554060436e-06,
2975
+ "loss": 0.2604,
2976
+ "mean_token_accuracy": 0.9193271338939667,
2977
+ "step": 1855
2978
+ },
2979
+ {
2980
+ "epoch": 3.992481203007519,
2981
+ "grad_norm": 0.17294557291581655,
2982
+ "learning_rate": 5e-06,
2983
+ "loss": 0.2556,
2984
+ "mean_token_accuracy": 0.920825207233429,
2985
+ "step": 1860
2986
+ },
2987
+ {
2988
+ "epoch": 3.992481203007519,
2989
+ "step": 1860,
2990
+ "total_flos": 966947082862592.0,
2991
+ "train_loss": 0.34282420668550717,
2992
+ "train_runtime": 10626.5662,
2993
+ "train_samples_per_second": 2.802,
2994
+ "train_steps_per_second": 0.175
2995
+ }
2996
+ ],
2997
+ "logging_steps": 5,
2998
+ "max_steps": 1860,
2999
+ "num_input_tokens_seen": 0,
3000
+ "num_train_epochs": 4,
3001
+ "save_steps": 1000,
3002
+ "stateful_callbacks": {
3003
+ "TrainerControl": {
3004
+ "args": {
3005
+ "should_epoch_stop": false,
3006
+ "should_evaluate": false,
3007
+ "should_log": false,
3008
+ "should_save": true,
3009
+ "should_training_stop": true
3010
+ },
3011
+ "attributes": {}
3012
+ }
3013
+ },
3014
+ "total_flos": 966947082862592.0,
3015
+ "train_batch_size": 1,
3016
+ "trial_name": null,
3017
+ "trial_params": null
3018
+ }