Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
twinkle-ai
's Collections
🏎️ Formosa-1 Series
🧠 Traditional Chinese Reasoning Datasets
💾 Traditional Chinese Datasets
📋 Eval Logs
📋 Eval Logs
updated
14 days ago
Benchmark log generated with Twinkle Eval, recording the model's outputs for each prompt.
Upvote
3
twinkle-ai/gpt-oss-eval-logs-and-scores
Viewer
•
Updated
Aug 13
•
2.63k
•
41
•
1
twinkle-ai/llama-4-eval-logs-and-scores
Viewer
•
Updated
Apr 9
•
750
•
58
•
2
Upvote
3
Share collection
View history
Collection guide
Browse collections