FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions Paper • 2509.17177 • Published Sep 21 • 13
Chinese LLM Leaderboard best models Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 24 items • Updated 4 days ago • 12