Spaces:

ahmad21omar
/

SLR-Bench

Sleeping

SLR-Bench / benchmark_data.csv

add the tuned models

a9249e6 6 months ago

1.27 kB

	Model,Logical Power Ranking,Logical Power Score,Accuracy,Syntax Score,Logic Basic Accuracy,Logic Easy Accuracy,Logic Medium Accuracy,Logic Hard Accuracy
	o3,1,15.5,0.78,0.8,0.99,0.93,0.74,0.45
	o4-mini-high,2,12.8,0.64,0.88,0.98,0.96,0.4,0.21
	o4-mini,3,12.3,0.61,0.86,0.93,0.88,0.52,0.13
	o1,4,11.9,0.59,0.68,0.92,0.89,0.41,0.15
	o3-mini,5,11.6,0.58,0.75,0.97,0.9,0.37,0.07
	o4-mini-low,6,10.3,0.52,0.91,0.91,0.81,0.25,0.09
	o1-mini,7,10.1,0.5,0.95,0.97,0.82,0.2,0.03
	Llama-3.1-8B-Tuned-FFT,8,9.4,0.47,0.96,0.92,0.77,0.17,0.02
	gemini-2.0-flash-thinking-exp-01-21,9,8.6,0.43,0.83,0.93,0.65,0.13,0.01
	Llama-3.1-8B-Tuned-LoRA,10,8.4,0.42,1.0,0.95,0.57,0.15,0.01
	DeepSeek-R1-Distill-Llama-70B,11,8.1,0.4,0.57,0.89,0.61,0.09,0.03
	gpt-4.5-preview,12,7.2,0.36,1.0,0.95,0.41,0.06,0.02
	gpt-4o,13,6.7,0.34,0.96,0.9,0.37,0.06,0.01
	Llama-3.3-70B-Instruct,14,5.1,0.26,0.99,0.84,0.17,0.01,0.01
	Llama-3.1-8B-Instruct,15,5.0,0.25,0.87,0.82,0.17,0.01,0.0
	QwQ-32B-Preview,16,4.6,0.23,0.84,0.77,0.15,0.0,0.01
	Internlm2-20b,17,3.9,0.19,0.82,0.71,0.07,0.0,0.0
	Qwen2-57B-A14B-Instruct,18,3.9,0.2,0.81,0.71,0.07,0.0,0.0
	CodeLlama-34b-Instruct-hf,19,3.5,0.17,0.78,0.68,0.01,0.0,0.0
	Mixtral-8x7B-Instruct-v0.1,20,3.1,0.15,0.93,0.61,0.01,0.0,0.0
	Llama-3.2-3B-Instruct,21,1.6,0.08,0.61,0.31,0.01,0.0,0.0