Spaces:

JustTheStatsHuman
/

Togmal-demo

Sleeping

App Files Files Community

Togmal-demo / GITHUB_INSTRUCTIONS.md

HeTalksInMaths

Initial commit: ToGMAL Prompt Difficulty Analyzer with real MMLU data

f9b1ad5 about 2 months ago

preview code

raw

history blame contribute delete

2.09 kB

	# 🚀 GitHub Setup Instructions

	## Steps to Publish Your Repository

	1. Create a new repository on GitHub:
	- Go to https://github.com/new
	- Repository name: `togmal-prompt-analyzer` (or any name you prefer)
	- Description: "Real-time LLM capability boundary detection using vector similarity"
	- Public repository
	- Do NOT initialize with README
	- Click "Create repository"

	2. Push your local repository to GitHub:
	```bash
	cd /Users/hetalksinmaths/togmal
	git remote add origin https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
	git branch -M main
	git push -u origin main
	```

	3. Replace YOUR_USERNAME and YOUR_REPO_NAME with your actual GitHub username and repository name.

	## What's Included in This Commit

	- benchmark_vector_db.py: Core vector database implementation
	- demo_app.py: Gradio web interface for prompt analysis
	- COMPLETE_DEMO_ANALYSIS.md: Comprehensive analysis of the system
	- DEMO_README.md: Documentation with results and instructions
	- requirements.txt: Python dependencies
	- .gitignore: Excludes large data files and virtual environment
	- test_vector_db.py: Test script with real data examples

	## Live Demo

	Your demo is currently running at:
	- Local: http://127.0.0.1:7861
	- Public: https://db11ee71660c8a3319.gradio.live

	## Key Features

	- 14,042 real MMLU questions with actual success rates
	- Real-time difficulty assessment (<50ms queries)
	- Production-ready vector database
	- Explainable results (shows similar benchmark questions)
	- Actionable recommendations based on difficulty

	## Analysis of Test Questions

	The system correctly differentiates between:
	- Hard prompts (23.9% success rate) like "Statement 1 \| Every field is also a ring..."
	- Easy prompts (100% success rate) like "What is 2 + 2?"

	## Next Steps After Pushing

	1. Add more benchmark datasets (GPQA Diamond, MATH)
	2. Fetch real per-question results from multiple top models
	3. Integrate with ToGMAL MCP server for Claude Desktop
	4. Deploy to HuggingFace Spaces for permanent hosting