Spaces:
Sleeping
Sleeping
| Model,Logical Power Ranking,Logical Power Score,Accuracy,Syntax Score,Logic Basic Accuracy,Logic Easy Accuracy,Logic Medium Accuracy,Logic Hard Accuracy | |
| o3,1,15.5,0.78,0.8,0.99,0.93,0.74,0.45 | |
| o4-mini-high,2,12.8,0.64,0.88,0.98,0.96,0.4,0.21 | |
| o4-mini,3,12.3,0.61,0.86,0.93,0.88,0.52,0.13 | |
| o1,4,11.9,0.59,0.68,0.92,0.89,0.41,0.15 | |
| o3-mini,5,11.6,0.58,0.75,0.97,0.9,0.37,0.07 | |
| o4-mini-low,6,10.3,0.52,0.91,0.91,0.81,0.25,0.09 | |
| o1-mini,7,10.1,0.5,0.95,0.97,0.82,0.2,0.03 | |
| Llama-3.1-8B-Tuned-FFT,8,9.4,0.47,0.96,0.92,0.77,0.17,0.02 | |
| gemini-2.0-flash-thinking-exp-01-21,9,8.6,0.43,0.83,0.93,0.65,0.13,0.01 | |
| Llama-3.1-8B-Tuned-LoRA,10,8.4,0.42,1.0,0.95,0.57,0.15,0.01 | |
| DeepSeek-R1-Distill-Llama-70B,11,8.1,0.4,0.57,0.89,0.61,0.09,0.03 | |
| gpt-4.5-preview,12,7.2,0.36,1.0,0.95,0.41,0.06,0.02 | |
| gpt-4o,13,6.7,0.34,0.96,0.9,0.37,0.06,0.01 | |
| Llama-3.3-70B-Instruct,14,5.1,0.26,0.99,0.84,0.17,0.01,0.01 | |
| Llama-3.1-8B-Instruct,15,5.0,0.25,0.87,0.82,0.17,0.01,0.0 | |
| QwQ-32B-Preview,16,4.6,0.23,0.84,0.77,0.15,0.0,0.01 | |
| Internlm2-20b,17,3.9,0.19,0.82,0.71,0.07,0.0,0.0 | |
| Qwen2-57B-A14B-Instruct,18,3.9,0.2,0.81,0.71,0.07,0.0,0.0 | |
| CodeLlama-34b-Instruct-hf,19,3.5,0.17,0.78,0.68,0.01,0.0,0.0 | |
| Mixtral-8x7B-Instruct-v0.1,20,3.1,0.15,0.93,0.61,0.01,0.0,0.0 | |
| Llama-3.2-3B-Instruct,21,1.6,0.08,0.61,0.31,0.01,0.0,0.0 | |