Running 58 Stick To Your Role! Leaderboard 🎭 58 Benchmarking LLMs on the stability of simulated populations
Running on CPU Upgrade 13.8k Open LLM Leaderboard 🏆 13.8k Track, rank and evaluate open LLMs and chatbots