Update app.py
Browse files
app.py
CHANGED
|
@@ -50,6 +50,7 @@ def make_arena_leaderboard_md(arena_df, last_updated_time):
|
|
| 50 |
|
| 51 |
leaderboard_md = f"""
|
| 52 |
Total # of models: **{total_models}**.{space} Total # of votes: **{"{:,}".format(total_votes)}**.{space} Last updated: {last_updated_time}.
|
|
|
|
| 53 |
***Rank (UB)**: model rating (upper bound), determined as one plus the number of models that are statistically better than the target model.
|
| 54 |
Model A is statistically better than Model B when the lower bound of Model A's rating is higher than the upper bound of Model B's rating (with a 95% confidence interval).
|
| 55 |
See Figure 1 below for a visualization of the confidence intervals of model ratings.
|
|
|
|
| 50 |
|
| 51 |
leaderboard_md = f"""
|
| 52 |
Total # of models: **{total_models}**.{space} Total # of votes: **{"{:,}".format(total_votes)}**.{space} Last updated: {last_updated_time}.
|
| 53 |
+
|
| 54 |
***Rank (UB)**: model rating (upper bound), determined as one plus the number of models that are statistically better than the target model.
|
| 55 |
Model A is statistically better than Model B when the lower bound of Model A's rating is higher than the upper bound of Model B's rating (with a 95% confidence interval).
|
| 56 |
See Figure 1 below for a visualization of the confidence intervals of model ratings.
|