Togmal-demo / DEMO_README.md
HeTalksInMaths
Initial commit: ToGMAL Prompt Difficulty Analyzer with real MMLU data
d67728f
|
raw
history blame
2.23 kB

🧠 ToGMAL Prompt Difficulty Analyzer

Real-time LLM capability boundary detection using vector similarity search.

🎯 What This Does

This system analyzes any prompt and tells you:

  1. How difficult it is for current LLMs (based on real benchmark data)
  2. Why it's difficult (shows similar benchmark questions)
  3. What to do about it (actionable recommendations)

πŸ”₯ Key Innovation

Instead of clustering by domain (all math together), we cluster by difficulty - what's actually hard for LLMs regardless of domain.

πŸ“Š Real Data

  • 14,042 MMLU questions with real success rates from top models
  • <50ms query time for real-time analysis
  • Production ready vector database

πŸš€ Demo Links

πŸ§ͺ Example Results

Hard Questions (Low Success Rates)

Prompt: "Statement 1 | Every field is also a ring..."
Risk: HIGH (23.9% success)
Recommendation: Multi-step reasoning with verification

Prompt: "Find all zeros of polynomial xΒ³ + 2x + 2 in Z₇"
Risk: MODERATE (43.8% success)
Recommendation: Use chain-of-thought prompting

Easy Questions (High Success Rates)

Prompt: "What is 2 + 2?"
Risk: MINIMAL (100% success)
Recommendation: Standard LLM response adequate

Prompt: "What is the capital of France?"
Risk: MINIMAL (100% success)
Recommendation: Standard LLM response adequate

πŸ› οΈ Technical Details

Architecture

User Prompt β†’ Embedding Model β†’ Vector DB β†’ K Nearest Questions β†’ Weighted Score

Components

  1. Sentence Transformers (all-MiniLM-L6-v2) for embeddings
  2. ChromaDB for vector storage
  3. Real MMLU data with success rates from top models
  4. Gradio for web interface

πŸ“ˆ Next Steps

  1. Add more benchmark datasets (GPQA, MATH)
  2. Fetch real per-question results from multiple top models
  3. Integrate with ToGMAL MCP server for Claude Desktop
  4. Deploy to HuggingFace Spaces for permanent hosting

πŸš€ Quick Start

# Install dependencies
uv pip install -r requirements.txt
uv pip install gradio

# Run the demo
python demo_app.py

Visit http://127.0.0.1:7860 to use the web interface.