Asynchronous RLHF
					Collection
				
Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf
					• 
				10 items
				• 
				Updated
					
				
This model is a fine-tuned version of EleutherAI/pythia-2.8b-deduped on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | 
|---|---|---|---|
| 2.3714 | 0.2007 | 183 | 2.2949 | 
| 2.2873 | 0.4013 | 366 | 2.2773 | 
| 2.2732 | 0.6020 | 549 | 2.2656 | 
| 2.2562 | 0.8026 | 732 | 2.2578 | 
Base model
EleutherAI/pythia-2.8b-deduped