Spaces:
Running
Running
Upload from GitHub Actions: added system architecture overview
Browse files- system_architecture_diagram.md +141 -0
system_architecture_diagram.md
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AI Language Monitor - System Architecture
|
| 2 |
+
|
| 3 |
+
This diagram shows the complete data flow from model discovery through evaluation to frontend visualization.
|
| 4 |
+
|
| 5 |
+
```mermaid
|
| 6 |
+
flowchart TD
|
| 7 |
+
%% Model Sources
|
| 8 |
+
A1["important_models<br/>Static Curated List"] --> D[load_models]
|
| 9 |
+
A2["get_historical_popular_models<br/>Web Scraping - Top 20"] --> D
|
| 10 |
+
A3["get_current_popular_models<br/>Web Scraping - Top 10"] --> D
|
| 11 |
+
A4["blocklist<br/>Exclusions"] --> D
|
| 12 |
+
|
| 13 |
+
%% Model Processing
|
| 14 |
+
D --> |"Combine & Dedupe"| E["Dynamic Model List<br/>~40-50 models"]
|
| 15 |
+
E --> |get_or_metadata| F["OpenRouter API<br/>Model Metadata"]
|
| 16 |
+
F --> |get_hf_metadata| G["HuggingFace API<br/>Model Details"]
|
| 17 |
+
G --> H["Enriched Model DataFrame"]
|
| 18 |
+
H --> |Save| I[models.json]
|
| 19 |
+
|
| 20 |
+
%% Language Data
|
| 21 |
+
J["languages.py<br/>BCP-47 + Population"] --> K["Top 100 Languages"]
|
| 22 |
+
|
| 23 |
+
%% Task Registry
|
| 24 |
+
L["tasks.py<br/>7 Evaluation Tasks"] --> M["Task Functions"]
|
| 25 |
+
M --> M1["translation_from/to<br/>BLEU + ChrF"]
|
| 26 |
+
M --> M2["classification<br/>Accuracy"]
|
| 27 |
+
M --> M3["mmlu<br/>Accuracy"]
|
| 28 |
+
M --> M4["arc<br/>Accuracy"]
|
| 29 |
+
M --> M5["truthfulqa<br/>Accuracy"]
|
| 30 |
+
M --> M6["mgsm<br/>Accuracy"]
|
| 31 |
+
|
| 32 |
+
%% Evaluation Pipeline
|
| 33 |
+
H --> |"models ID"| N["main.py evaluate"]
|
| 34 |
+
K --> |"languages bcp_47"| N
|
| 35 |
+
L --> |"tasks.items"| N
|
| 36 |
+
N --> |"Filter by model.tasks"| O["Valid Combinations<br/>Model Γ Language Γ Task"]
|
| 37 |
+
O --> |"10 samples each"| P["Evaluation Execution"]
|
| 38 |
+
|
| 39 |
+
%% Task Execution
|
| 40 |
+
P --> Q1[translate_and_evaluate]
|
| 41 |
+
P --> Q2[classify_and_evaluate]
|
| 42 |
+
P --> Q3[mmlu_and_evaluate]
|
| 43 |
+
P --> Q4[arc_and_evaluate]
|
| 44 |
+
P --> Q5[truthfulqa_and_evaluate]
|
| 45 |
+
P --> Q6[mgsm_and_evaluate]
|
| 46 |
+
|
| 47 |
+
%% API Calls
|
| 48 |
+
Q1 --> |"complete() API"| R["OpenRouter<br/>Model Inference"]
|
| 49 |
+
Q2 --> |"complete() API"| R
|
| 50 |
+
Q3 --> |"complete() API"| R
|
| 51 |
+
Q4 --> |"complete() API"| R
|
| 52 |
+
Q5 --> |"complete() API"| R
|
| 53 |
+
Q6 --> |"complete() API"| R
|
| 54 |
+
|
| 55 |
+
%% Results Processing
|
| 56 |
+
R --> |Scores| S["Result Aggregation<br/>Mean by model+lang+task"]
|
| 57 |
+
S --> |Save| T[results.json]
|
| 58 |
+
|
| 59 |
+
%% Backend & Frontend
|
| 60 |
+
T --> |Read| U[backend.py]
|
| 61 |
+
I --> |Read| U
|
| 62 |
+
U --> |make_model_table| V["Model Rankings"]
|
| 63 |
+
U --> |make_country_table| W["Country Aggregation"]
|
| 64 |
+
U --> |"API Endpoint"| X["FastAPI /api/data"]
|
| 65 |
+
X --> |"JSON Response"| Y["Frontend React App"]
|
| 66 |
+
|
| 67 |
+
%% UI Components
|
| 68 |
+
Y --> Z1["WorldMap.js<br/>Country Visualization"]
|
| 69 |
+
Y --> Z2["ModelTable.js<br/>Model Rankings"]
|
| 70 |
+
Y --> Z3["LanguageTable.js<br/>Language Coverage"]
|
| 71 |
+
Y --> Z4["DatasetTable.js<br/>Task Performance"]
|
| 72 |
+
|
| 73 |
+
%% Data Sources
|
| 74 |
+
subgraph DS ["Data Sources"]
|
| 75 |
+
DS1["Flores-200<br/>Translation Sentences"]
|
| 76 |
+
DS2["MMLU/AfriMMLU<br/>Knowledge QA"]
|
| 77 |
+
DS3["ARC<br/>Science Reasoning"]
|
| 78 |
+
DS4["TruthfulQA<br/>Truthfulness"]
|
| 79 |
+
DS5["MGSM<br/>Math Problems"]
|
| 80 |
+
end
|
| 81 |
+
|
| 82 |
+
DS1 --> Q1
|
| 83 |
+
DS2 --> Q3
|
| 84 |
+
DS3 --> Q4
|
| 85 |
+
DS4 --> Q5
|
| 86 |
+
DS5 --> Q6
|
| 87 |
+
|
| 88 |
+
%% Styling
|
| 89 |
+
classDef modelSource fill:#e1f5fe
|
| 90 |
+
classDef evaluation fill:#f3e5f5
|
| 91 |
+
classDef api fill:#fff3e0
|
| 92 |
+
classDef storage fill:#e8f5e8
|
| 93 |
+
classDef frontend fill:#fce4ec
|
| 94 |
+
|
| 95 |
+
class A1,A2,A3,A4 modelSource
|
| 96 |
+
class Q1,Q2,Q3,Q4,Q5,Q6,P evaluation
|
| 97 |
+
class R,F,G,X api
|
| 98 |
+
class T,I storage
|
| 99 |
+
class Y,Z1,Z2,Z3,Z4 frontend
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
## Architecture Components
|
| 103 |
+
|
| 104 |
+
### π΅ Model Discovery (Blue)
|
| 105 |
+
- **Static Curated Models**: Handpicked important models for comprehensive evaluation
|
| 106 |
+
- **Dynamic Popular Models**: Real-time discovery of trending models via web scraping
|
| 107 |
+
- **Quality Control**: Blocklist for problematic or incompatible models
|
| 108 |
+
- **Metadata Enrichment**: Rich model information from OpenRouter and HuggingFace APIs
|
| 109 |
+
|
| 110 |
+
### π£ Evaluation Pipeline (Purple)
|
| 111 |
+
- **7 Active Tasks**: Translation (bidirectional), Classification, MMLU, ARC, TruthfulQA, MGSM
|
| 112 |
+
- **Combinatorial Approach**: Systematic evaluation across Model Γ Language Γ Task combinations
|
| 113 |
+
- **Sample-based**: 10 evaluations per combination for statistical reliability
|
| 114 |
+
- **Unified API**: All tasks use OpenRouter's `complete()` function for consistency
|
| 115 |
+
|
| 116 |
+
### π API Integration (Orange)
|
| 117 |
+
- **OpenRouter**: Primary model inference API for all language model tasks
|
| 118 |
+
- **HuggingFace**: Model metadata and open-source model information
|
| 119 |
+
- **Google Translate**: Specialized translation API for comparison baseline
|
| 120 |
+
|
| 121 |
+
### π’ Data Storage (Green)
|
| 122 |
+
- **results.json**: Aggregated evaluation scores and metrics
|
| 123 |
+
- **models.json**: Dynamic model list with metadata
|
| 124 |
+
- **languages.json**: Language information with population data
|
| 125 |
+
|
| 126 |
+
### π‘ Frontend Visualization (Pink)
|
| 127 |
+
- **WorldMap**: Interactive country-level language proficiency visualization
|
| 128 |
+
- **ModelTable**: Ranked model performance leaderboard
|
| 129 |
+
- **LanguageTable**: Language coverage and speaker statistics
|
| 130 |
+
- **DatasetTable**: Task-specific performance breakdowns
|
| 131 |
+
|
| 132 |
+
## Data Flow Summary
|
| 133 |
+
|
| 134 |
+
1. **Model Discovery**: Combine curated + trending models β enrich with metadata
|
| 135 |
+
2. **Evaluation Setup**: Generate all valid Model Γ Language Γ Task combinations
|
| 136 |
+
3. **Task Execution**: Run evaluations using appropriate datasets and APIs
|
| 137 |
+
4. **Result Processing**: Aggregate scores and save to JSON files
|
| 138 |
+
5. **Backend Serving**: FastAPI serves processed data via REST API
|
| 139 |
+
6. **Frontend Display**: React app visualizes data through interactive components
|
| 140 |
+
|
| 141 |
+
This architecture enables scalable, automated evaluation of AI language models across diverse languages and tasks while providing real-time insights through an intuitive web interface.
|