Fix rankings: #1 Optimal Accuracy (88.7%), #2 Optimal Selection (43.0%), #7 overall / #5 open-source
Browse files
README.md
CHANGED
|
@@ -19,15 +19,18 @@ metrics:
|
|
| 19 |
|
| 20 |
## Performance
|
| 21 |
|
| 22 |
-
π
|
| 23 |
|
| 24 |
Official RouterArena Full Dataset Results (8,400 queries):
|
| 25 |
-
- **88.7% Optimal Accuracy Score** - π₯ SOTA!
|
| 26 |
-
- **
|
| 27 |
-
- **
|
|
|
|
| 28 |
- **$0.60 per 1K queries** - Cost-efficient routing
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
| 31 |
|
| 32 |
Sub_10 Benchmark (809 queries):
|
| 33 |
- **69.05% accuracy**
|
|
@@ -164,22 +167,29 @@ predictions = router.predict(augmented, k=4)
|
|
| 164 |
|
| 165 |
## RouterArena Leaderboard
|
| 166 |
|
| 167 |
-
π **Official Results - #1
|
| 168 |
|
| 169 |

|
| 170 |
|
| 171 |
-
Chayan on the official [RouterArena leaderboard](https://routeworks.github.io/):
|
| 172 |
|
| 173 |
-
| Rank
|
| 174 |
-
|
| 175 |
-
| 1 | **Chayan** |
|
| 176 |
-
| 2 | RouterBench-MLP |
|
| 177 |
-
| 3 | Azure |
|
| 178 |
-
| 4 | vLLM-SR |
|
|
|
|
| 179 |
|
| 180 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 181 |
|
| 182 |
-
|
|
|
|
|
|
|
| 183 |
|
| 184 |
View the full leaderboard and PR: [RouterArena PR #24](https://github.com/RouteWorks/RouterArena/pull/24)
|
| 185 |
|
|
|
|
| 19 |
|
| 20 |
## Performance
|
| 21 |
|
| 22 |
+
π **SOTA on RouterArena: #1 Optimal Accuracy Score, #2 Optimal Selection**
|
| 23 |
|
| 24 |
Official RouterArena Full Dataset Results (8,400 queries):
|
| 25 |
+
- **88.7% Optimal Accuracy Score** - π₯ #1 SOTA! (Best routing decision quality)
|
| 26 |
+
- **43.0% Optimal Selection Score** - π₯ #2 Silver! (Second-best model selection)
|
| 27 |
+
- **64.9% Overall Accuracy** - #7 overall, #5 among open-source routers
|
| 28 |
+
- **Arena Score: 63.8** - #7 overall, #5 among open-source routers
|
| 29 |
- **$0.60 per 1K queries** - Cost-efficient routing
|
| 30 |
|
| 31 |
+
**Key Metrics Explained:**
|
| 32 |
+
- **Optimal Accuracy Score**: When Chayan routes to a model, that model gives the correct answer 88.7% of the time (highest on leaderboard!)
|
| 33 |
+
- **Optimal Selection Score**: Chayan selects the best available model 43% of the time (second-best on leaderboard)
|
| 34 |
|
| 35 |
Sub_10 Benchmark (809 queries):
|
| 36 |
- **69.05% accuracy**
|
|
|
|
| 167 |
|
| 168 |
## RouterArena Leaderboard
|
| 169 |
|
| 170 |
+
π **Official Results - #1 Optimal Accuracy, #2 Optimal Selection**
|
| 171 |
|
| 172 |

|
| 173 |
|
| 174 |
+
Chayan on the official [RouterArena leaderboard](https://routeworks.github.io/) (sorted by Optimal Accuracy):
|
| 175 |
|
| 176 |
+
| Rank | Router | **Opt. Acc** | **Opt. Select** | Accuracy | Arena Score | Cost/1k | Type |
|
| 177 |
+
|------|--------|--------------|-----------------|----------|-------------|---------|------|
|
| 178 |
+
| **1** | **Chayan** | **88.7%** π₯ | **43.0%** π₯ | 64.9% | 63.8 | $0.60 | Open-Source |
|
| 179 |
+
| 2 | RouterBench-MLP | 83.3% | 13.4% | 61.6% | 57.6 | $4.80 | Open-Source |
|
| 180 |
+
| 3 | Azure | 82.0% | 22.5% | 68.1% | 66.7 | $0.50 | Closed-Source |
|
| 181 |
+
| 4 | vLLM-SR | 79.3% | 4.8% | 67.3% | 64.3 | $1.70 | Open-Source |
|
| 182 |
+
| 5 | RouterBench-KNN | 78.8% | 13.1% | 58.7% | 55.5 | $4.30 | Open-Source |
|
| 183 |
|
| 184 |
+
**Overall Rankings:**
|
| 185 |
+
- **Optimal Accuracy Score**: π₯ #1 (88.7% - SOTA!)
|
| 186 |
+
- **Optimal Selection Score**: π₯ #2 (43.0% - Silver!)
|
| 187 |
+
- **Overall Accuracy**: #7 overall, #5 among open-source
|
| 188 |
+
- **Arena Score**: #7 overall, #5 among open-source
|
| 189 |
|
| 190 |
+
**π₯ SOTA Achievement - Optimal Accuracy Score**: Chayan achieves **88.7%**, the highest score among all routers. This measures routing decision quality - when Chayan routes to a model, that model gives the correct answer 88.7% of the time.
|
| 191 |
+
|
| 192 |
+
**π₯ Silver Medal - Optimal Selection Score**: Chayan ranks **#2** with **43.0%**, meaning it selects the optimal model (best among all available models) 43% of the time, second only to Azure's 46.3%.
|
| 193 |
|
| 194 |
View the full leaderboard and PR: [RouterArena PR #24](https://github.com/RouteWorks/RouterArena/pull/24)
|
| 195 |
|