Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.49.1
metadata
title: ToGMAL - AI Difficulty & Safety Analysis
emoji: π§
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.42.0
app_file: app_combined.py
pinned: false
license: apache-2.0
short_description: LLM difficulty analyzer with chat assistant & MCP tools
π§ ToGMAL - Intelligent LLM Difficulty & Safety Analysis
Taxonomy of Generative Model Apparent Limitations - Real-time difficulty assessment and chat interface with MCP tool integration.
π― Unified Tabbed Interface
Switch seamlessly between two powerful tools:
π Tab 1: Difficulty Analyzer
- Direct analysis using 32K+ benchmark questions
- Instant difficulty ratings and success rates
- Vector similarity search
- Perfect for quick assessments
π€ Tab 2: Chat Assistant π
Interactive chat where a free LLM can call MCP tools!
- π€ Chat with Mistral-7B (free via HuggingFace)
- π οΈ LLM calls tools dynamically based on context
- π Transparent tool execution (see what's happening)
- π¬ Natural language responses using tool data
Features
- π Real Benchmark Data: Analyzes prompts against 14,042 questions from MMLU, MMLU-Pro, GPQA, and MATH datasets
- π― Vector Similarity Search: Uses semantic embeddings to find similar benchmark questions
- π Success Rate Prediction: Shows weighted success rates from top LLMs (Claude, GPT-4, Gemini)
- π‘ Smart Recommendations: Provides actionable suggestions based on difficulty level
How It Works
- Enter any prompt or question
- The system finds the 5 most similar benchmark questions using vector embeddings
- Calculates a weighted difficulty score based on how well LLMs perform on similar questions
- Provides risk assessment and recommendations
Example Prompts
- "Calculate the quantum correction to the partition function for a 3D harmonic oscillator"
- "Prove that there are infinitely many prime numbers"
- "Diagnose a patient with acute chest pain and shortness of breath"
- "Implement a binary search tree with insert and search operations"
π― Quick Start
Run Combined Demo (Recommended)
python app_combined.py
Or run individual demos:
Run Difficulty Analyzer Only
python app.py
Run Chat Demo Only
python chat_app.py
# Or use the launcher:
./launch_chat.sh
Try in the Chat tab:
- "How difficult is this: [your prompt]?"
- "Is this safe: [your prompt]?"
- "Analyze the difficulty of: Calculate quantum corrections..."
See CHAT_DEMO_README.md for full documentation.
Technology
- Vector Database: ChromaDB with persistent storage
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- Frontend: Gradio
- Data: Real benchmark questions with ground-truth success rates
Repository
Full source code: github.com/HeTalksInMaths/togmal-mcp