Spaces:
				
			
			
	
			
			
		Sleeping
		
	
	
	
			
			
	
	
	
	
		
		
		Sleeping
		
	| title: Head to Head Evaluations Comparator | |
| emoji: 🦀 | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.18.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: Evaluates 2 models or 1 model w/diff configs on a dataset | |
| This Space replicates the evaluation of different models on various datasets. | |
| Dataset: https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro | |
| GitHub: https://github.com/TIGER-AI-Lab/MMLU-Pro | |
| Paper: https://arxiv.org/abs/2406.01574 (Submitted at NeurIPS 2024) | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | 
