Evaluation of DPO Configurations
					Collection
				
An Empirical Study of DPO Configuration Choices for LLM Alignment
					• 
				14 items
				• 
				Updated
					
				
This repo contains LoRA adapter created by aligning Tülu3 8B using Direct Preference Optimization (DPO) on the mix all following datasets:
It was trained as a series of models for studying DPO alignment.
See the base model card for usage and chat template details.
This adapter is released under Meta's Llama 3.1 Community License Agreement. Llama 3.1 is © Meta Platforms, Inc.
If this work was helpful, please cite:
TBA