|
|
--- |
|
|
language: |
|
|
- en |
|
|
library_name: adaptive-classifier |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- llm |
|
|
- routing |
|
|
- multi-model |
|
|
- bert |
|
|
- router-arena |
|
|
- model-selection |
|
|
--- |
|
|
|
|
|
# Chayan: Multi-Model LLM Router |
|
|
|
|
|
This model is a high-performance LLM router presented in the paper [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202). |
|
|
|
|
|
- π Paper (Hugging Face): [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202) |
|
|
- π Paper (arXiv): https://arxiv.org/abs/2510.00202 |
|
|
- π» Library Code: https://github.com/codelion/adaptive-classifier |
|
|
- π RouterArena Project Page: https://routeworks.github.io/ |
|
|
|
|
|
**Chayan** intelligently routes between 4 models (gpt-4o-mini, gemini-2.5-flash-lite, gemini-2.5-flash, and gpt-4o) to optimize the accuracy-cost tradeoff. |
|
|
|
|
|
## π RouterArena Performance |
|
|
|
|
|
**Official Leaderboard Results** (8,400 queries): |
|
|
- π₯ **#1 Optimal Accuracy Score: 88.7%** - SOTA! (Best routing decision quality) |
|
|
- π₯ **#2 Optimal Selection Score: 43.0%** - Silver! (Second-best model selection) |
|
|
- **#7 Overall** (#5 open-source): 64.9% accuracy, 63.8 arena score |
|
|
- **$0.60 per 1K queries** - Cost-efficient routing |
|
|
|
|
|
 |
|
|
|
|
|
**What do these metrics mean?** |
|
|
- **Optimal Accuracy**: When Chayan routes to a model, that model gives the correct answer 88.7% of the time |
|
|
- **Optimal Selection**: Chayan selects the best available model 43% of the time |
|
|
|
|
|
View full leaderboard: [RouterArena](https://routeworks.github.io/) | [PR #24](https://github.com/RouteWorks/RouterArena/pull/24) |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```bash |
|
|
pip install adaptive-classifier |
|
|
``` |
|
|
|
|
|
```python |
|
|
from adaptive_classifier import AdaptiveClassifier |
|
|
|
|
|
# Load router |
|
|
router = AdaptiveClassifier.load("adaptive-classifier/chayan") |
|
|
|
|
|
# Get routing decision |
|
|
query = "What is the capital of France?" |
|
|
predictions = router.predict(query, k=4) |
|
|
|
|
|
# Route to top model |
|
|
selected_model = predictions[0][0] # e.g., "openai/gpt-4o-mini" |
|
|
``` |
|
|
|
|
|
### Recommended: Use with Calibration |
|
|
|
|
|
```python |
|
|
# Apply calibration factors for best performance |
|
|
calibration = { |
|
|
"openai/gpt-4o-mini": 0.9, |
|
|
"google/gemini-2.5-flash-lite": 1.5, |
|
|
"google/gemini-2.5-flash": 1.8, |
|
|
"openai/gpt-4o": 1.5 |
|
|
} |
|
|
|
|
|
predictions = router.predict(query, k=4) |
|
|
calibrated_scores = {model: score * calibration[model] for model, score in predictions} |
|
|
selected_model = max(calibrated_scores.items(), key=lambda x: x[1])[0] |
|
|
``` |
|
|
|
|
|
## Architecture |
|
|
|
|
|
**Core Components:** |
|
|
- **Base Model**: BERT-base-uncased embeddings |
|
|
- **Classifier**: Adaptive K-NN with prototype memory (FAISS-backed) |
|
|
- **Innovation**: Calibrated confidence scores to correct training data imbalance |
|
|
|
|
|
**Supported Models:** |
|
|
|
|
|
| Model | Use Case | Cost/1M tokens | |
|
|
|-------|----------|----------------| |
|
|
| openai/gpt-4o-mini | Simple queries | $0.15 | |
|
|
| google/gemini-2.5-flash-lite | Medium complexity | $0.075 | |
|
|
| google/gemini-2.5-flash | Higher complexity | $0.30 | |
|
|
| openai/gpt-4o | Complex queries | $2.50 | |
|
|
|
|
|
## How It Works |
|
|
|
|
|
### Training |
|
|
- **Dataset**: RouterArena sub_10 (809 queries) |
|
|
- **Oracle Labels**: 4-model cascade strategy (select cheapest successful model) |
|
|
- **Training Time**: 19.2 minutes |
|
|
- **Method**: K-NN classifier with 3000 prototypes, temperature 0.4 |
|
|
|
|
|
### The Calibration Breakthrough |
|
|
|
|
|
The uncalibrated router achieved 61.76% accuracy but was biased toward gpt-4o-mini (83% routing). This happened because the training data had class imbalance: |
|
|
- 57% gpt-4o-mini examples |
|
|
- 27% gpt-4o examples |
|
|
- 12% gemini-flash-lite examples |
|
|
- 4% gemini-flash examples |
|
|
|
|
|
**Solution**: Apply post-training calibration factors to correct the bias without retraining. |
|
|
|
|
|
**Result**: +7.29pp improvement (61.76% β 69.05% on sub_10 benchmark) |
|
|
|
|
|
## Performance Benchmarks |
|
|
|
|
|
**Sub_10 Benchmark (809 queries):** |
|
|
|
|
|
| Router | Accuracy | Cost/1K | |
|
|
|--------|----------|---------| |
|
|
| All gpt-4o-mini (baseline) | 56.98% | $0.088 | |
|
|
| 2-model router | 61.43% | $0.217 | |
|
|
| Chayan (uncalibrated) | 61.76% | $0.269 | |
|
|
| **Chayan (calibrated)** | **69.05%** | **$0.333** | |
|
|
| Perfect 2-model oracle | 69.84% | $0.784 | |
|
|
|
|
|
**Key Insight**: Chayan achieves 99% of perfect oracle performance at 57% lower cost. |
|
|
|
|
|
**Full Dataset (8,400 queries):** |
|
|
- **Optimal Accuracy**: 88.7% (π₯ #1) |
|
|
- **Optimal Selection**: 43.0% (π₯ #2) |
|
|
- **Overall Accuracy**: 64.9% (#7 overall, #5 open-source) |
|
|
- **Cost**: $0.60/1K queries |
|
|
|
|
|
## Advanced Usage |
|
|
|
|
|
### Feature Augmentation |
|
|
|
|
|
Chayan was trained with query features prepended as tokens: |
|
|
|
|
|
```python |
|
|
from adaptive_classifier.complexity_features import augment_query_with_features |
|
|
|
|
|
query = "What is 2+2?" |
|
|
augmented = augment_query_with_features(query) |
|
|
# Returns: "[LEN:12][WORDS:3][MATH:1][SENT:1][MC:0] What is 2+2?" |
|
|
|
|
|
predictions = router.predict(augmented, k=4) |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Calibration factors optimized on RouterArena sub_10; may require adjustment for other domains |
|
|
- Requires the 4 specific models to be available via API |
|
|
- Performance depends on query distribution similar to RouterArena benchmark |
|
|
- Cost estimates assume ~500 tokens per query |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{adaptive_classifier, |
|
|
title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning}, |
|
|
author = {Sharma, Asankhaya}, |
|
|
year = {2025}, |
|
|
publisher = {GitHub}, |
|
|
url = {https://github.com/codelion/adaptive-classifier} |
|
|
} |
|
|
``` |