chayan / README.md

Add `pipeline_tag: text-classification` and improve link visibility (#1)

d0f3f95 verified 7 days ago

5.5 kB

	---
	language:
	- en
	library_name: adaptive-classifier
	license: apache-2.0
	metrics:
	- accuracy
	pipeline_tag: text-classification
	tags:
	- llm
	- routing
	- multi-model
	- bert
	- router-arena
	- model-selection
	---

	# Chayan: Multi-Model LLM Router

	This model is a high-performance LLM router presented in the paper [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202).

	- 📚 Paper (Hugging Face): [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202)
	- 📚 Paper (arXiv): https://arxiv.org/abs/2510.00202
	- 💻 Library Code: https://github.com/codelion/adaptive-classifier
	- 🌐 RouterArena Project Page: https://routeworks.github.io/

	Chayan intelligently routes between 4 models (gpt-4o-mini, gemini-2.5-flash-lite, gemini-2.5-flash, and gpt-4o) to optimize the accuracy-cost tradeoff.

	## 🏆 RouterArena Performance

	Official Leaderboard Results (8,400 queries):
	- 🥇 #1 Optimal Accuracy Score: 88.7% - SOTA! (Best routing decision quality)
	- 🥈 #2 Optimal Selection Score: 43.0% - Silver! (Second-best model selection)
	- #7 Overall (#5 open-source): 64.9% accuracy, 63.8 arena score
	- $0.60 per 1K queries - Cost-efficient routing

	![RouterArena Leaderboard](routerarena_leaderboard.png)

	What do these metrics mean?
	- Optimal Accuracy: When Chayan routes to a model, that model gives the correct answer 88.7% of the time
	- Optimal Selection: Chayan selects the best available model 43% of the time

	View full leaderboard: [RouterArena](https://routeworks.github.io/) \| [PR #24](https://github.com/RouteWorks/RouterArena/pull/24)

	## Quick Start

	```bash
	pip install adaptive-classifier
	```

	```python
	from adaptive_classifier import AdaptiveClassifier

	# Load router
	router = AdaptiveClassifier.load("adaptive-classifier/chayan")

	# Get routing decision
	query = "What is the capital of France?"
	predictions = router.predict(query, k=4)

	# Route to top model
	selected_model = predictions[0][0] # e.g., "openai/gpt-4o-mini"
	```

	### Recommended: Use with Calibration

	```python
	# Apply calibration factors for best performance
	calibration = {
	"openai/gpt-4o-mini": 0.9,
	"google/gemini-2.5-flash-lite": 1.5,
	"google/gemini-2.5-flash": 1.8,
	"openai/gpt-4o": 1.5
	}

	predictions = router.predict(query, k=4)
	calibrated_scores = {model: score * calibration[model] for model, score in predictions}
	selected_model = max(calibrated_scores.items(), key=lambda x: x[1])[0]
	```

	## Architecture

	Core Components:
	- Base Model: BERT-base-uncased embeddings
	- Classifier: Adaptive K-NN with prototype memory (FAISS-backed)
	- Innovation: Calibrated confidence scores to correct training data imbalance

	Supported Models:

	\| Model \| Use Case \| Cost/1M tokens \|
	\|-------\|----------\|----------------\|
	\| openai/gpt-4o-mini \| Simple queries \| $0.15 \|
	\| google/gemini-2.5-flash-lite \| Medium complexity \| $0.075 \|
	\| google/gemini-2.5-flash \| Higher complexity \| $0.30 \|
	\| openai/gpt-4o \| Complex queries \| $2.50 \|

	## How It Works

	### Training
	- Dataset: RouterArena sub_10 (809 queries)
	- Oracle Labels: 4-model cascade strategy (select cheapest successful model)
	- Training Time: 19.2 minutes
	- Method: K-NN classifier with 3000 prototypes, temperature 0.4

	### The Calibration Breakthrough

	The uncalibrated router achieved 61.76% accuracy but was biased toward gpt-4o-mini (83% routing). This happened because the training data had class imbalance:
	- 57% gpt-4o-mini examples
	- 27% gpt-4o examples
	- 12% gemini-flash-lite examples
	- 4% gemini-flash examples

	Solution: Apply post-training calibration factors to correct the bias without retraining.

	Result: +7.29pp improvement (61.76% → 69.05% on sub_10 benchmark)

	## Performance Benchmarks

	Sub_10 Benchmark (809 queries):

	\| Router \| Accuracy \| Cost/1K \|
	\|--------\|----------\|---------\|
	\| All gpt-4o-mini (baseline) \| 56.98% \| $0.088 \|
	\| 2-model router \| 61.43% \| $0.217 \|
	\| Chayan (uncalibrated) \| 61.76% \| $0.269 \|
	\| Chayan (calibrated) \| 69.05% \| $0.333 \|
	\| Perfect 2-model oracle \| 69.84% \| $0.784 \|

	Key Insight: Chayan achieves 99% of perfect oracle performance at 57% lower cost.

	Full Dataset (8,400 queries):
	- Optimal Accuracy: 88.7% (🥇 #1)
	- Optimal Selection: 43.0% (🥈 #2)
	- Overall Accuracy: 64.9% (#7 overall, #5 open-source)
	- Cost: $0.60/1K queries

	## Advanced Usage

	### Feature Augmentation

	Chayan was trained with query features prepended as tokens:

	```python
	from adaptive_classifier.complexity_features import augment_query_with_features

	query = "What is 2+2?"
	augmented = augment_query_with_features(query)
	# Returns: "[LEN:12][WORDS:3][MATH:1][SENT:1][MC:0] What is 2+2?"

	predictions = router.predict(augmented, k=4)
	```

	## Limitations

	- Calibration factors optimized on RouterArena sub_10; may require adjustment for other domains
	- Requires the 4 specific models to be available via API
	- Performance depends on query distribution similar to RouterArena benchmark
	- Cost estimates assume ~500 tokens per query

	## Citation

	```bibtex
	@software{adaptive_classifier,
	title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
	author = {Sharma, Asankhaya},
	year = {2025},
	publisher = {GitHub},
	url = {https://github.com/codelion/adaptive-classifier}
	}
	```