# HuggingFace Spaces Deployment Guide - ToGMAL Demo

## 🚀 Quick Deployment Steps

### 1. Prepare Repository
```bash
cd /Users/hetalksinmaths/togmal/Togmal-demo

# Ensure all files are up to date
ls -la
# Should see: app.py, benchmark_vector_db.py, requirements.txt, README.md
```

### 2. Push to HuggingFace Spaces

```bash
# If not already done, initialize git repo
git init
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/togmal-demo

# Add all files
git add app.py benchmark_vector_db.py requirements.txt README.md
git commit -m "Update: 32K+ questions across 20 domains with progressive loading"

# Push to HuggingFace
git push hf main
```

### 3. Monitor Initial Build

The demo will:
1. **Build 5K questions** on first launch (fast startup, ~5-10 min)
2. **Allow progressive expansion** via UI button (+5K per click)
3. **Reach full 32K+** in ~7 clicks (user-controlled)

---

## 📦 File Structure

```
Togmal-demo/
├── app.py                          # Main Gradio app with progressive loading
├── benchmark_vector_db.py          # Vector DB engine
├── requirements.txt                # Dependencies
├── README.md                       # User-facing documentation
├── DEPLOYMENT_GUIDE.md            # This file
└── data/                          # Created on first run
    └── benchmark_vector_db/       # ChromaDB persistence
```

---

## 🎯 Demo Features

### Initial State (5K Questions)
- Fast build (<10 min on HF Spaces)
- All 20 domains represented (stratified sampling)
- Immediate functionality for demo

### Progressive Expansion
- **Button:** "🚀 Expand Database (+5K questions)"
- **Sources Loaded:** MMLU, MMLU-Pro, ARC-Challenge, HellaSwag, GSM8K, TruthfulQA, Winogrande
- **Progress Display:** Shows % complete and remaining questions
- **Final Size:** 32,719 questions

### Assessment Features
- Real-time prompt difficulty scoring
- k-nearest benchmark questions (adjustable 1-10)
- Risk level: MINIMAL → LOW → MODERATE → HIGH → CRITICAL
- Success rate estimation
- Actionable recommendations

---

## 📊 Data Sources (7 Benchmarks)

| Source | Questions | Domain Focus |
|--------|-----------|--------------|
| MMLU | 14,042 | General knowledge |
| MMLU-Pro | 12,102 | Advanced knowledge |
| ARC-Challenge | 1,172 | Science reasoning |
| HellaSwag | 2,000 | Commonsense NLI |
| GSM8K | 1,319 | Math word problems |
| TruthfulQA | 817 | Truthfulness |
| Winogrande | 1,267 | Commonsense reasoning |

**Total:** 32,719 questions across 20 domains

---

## 🎬 User Journey

### First Visit
1. User lands on demo page
2. Database auto-builds with 5K questions (~5-10 min)
3. Can immediately test prompts
4. Sees "📊 Database Management" accordion

### Expansion (Optional)
1. Click "🚀 Expand Database (+5K questions)"
2. Watch progress (2-3 min per batch)
3. Repeat until satisfied (or reach full 32K+)
4. Database persists across sessions

### Assessment
1. Enter any prompt in text box
2. Adjust k (number of similar questions)
3. Click "Analyze Difficulty"
4. See risk level, success rate, similar questions

---

## 🔧 Technical Details

### Performance
- **Query Time:** Sub-50ms for similarity search
- **Embedding Model:** all-MiniLM-L6-v2 (fast, efficient)
- **Vector DB:** ChromaDB (persistent)
- **Batch Size:** 1000 questions/batch during indexing

### Memory Management
- **Initial Build:** ~2GB RAM (5K questions)
- **Full Database:** ~4GB RAM (32K questions)
- **HF Spaces:** 16GB available (plenty of headroom)

### Error Handling
- Graceful fallback if datasets fail to load
- Per-source try/except blocks
- Detailed logging for debugging

---

## 🎤 VC Pitch Talking Points

### Demo Flow for VCs
1. **Show Initial Capability** (5K database)
   - "Already functional with 5K questions across 20 domains"
   - Run 2-3 example prompts

2. **Demonstrate Scalability** (expand live)
   - "Click to expand - adds 5K more in 2 minutes"
   - Show progress indicator
   - Highlight: "Production system has 32K+ questions"

3. **Highlight Domains** (20+ coverage)
   - Point out new domains: truthfulness, commonsense, math word problems
   - Emphasize AI safety focus

4. **Show Technical Excellence**
   - Sub-50ms query performance
   - Real benchmark data (not synthetic)
   - 7 industry-standard sources

### Key Messages
- ✅ **Production-ready** (32K questions indexed)
- ✅ **Scalable architecture** (progressive loading)
- ✅ **AI safety focused** (truthfulness, hallucination detection)
- ✅ **Comprehensive coverage** (20 domains, 7 benchmarks)
- ✅ **Real-time assessment** (vector similarity search)

---

## 🐛 Troubleshooting

### Build Timeout on HF Spaces
**Problem:** Initial build exceeds 10-minute limit  
**Solution:** Already handled! Initial build only loads 5K questions

### Memory Issues During Expansion
**Problem:** OOM errors when adding large batches  
**Solution:** Batched indexing (1K per batch) prevents this

### Dataset Loading Failures
**Problem:** Some datasets require authentication  
**Solution:** Graceful fallback - loads what's available, warns for rest

### Slow Query Performance
**Problem:** Similarity search takes >100ms  
**Solution:** Check database size - should be <50ms for 32K questions

---

## 📈 Future Enhancements

### Short-term (Next Sprint)
- [ ] Add GPQA Diamond for expert-level questions
- [ ] Include MATH dataset for advanced mathematics
- [ ] Show domain distribution chart in UI
- [ ] Add example prompts per domain

### Medium-term (Next Quarter)
- [ ] Integrate per-question model results (real success rates)
- [ ] Add filtering by domain in UI
- [ ] Export difficulty reports
- [ ] A/B testing different embedding models

### Long-term (6+ Months)
- [ ] Multi-language support
- [ ] Custom dataset upload
- [ ] API endpoint for programmatic access
- [ ] Integration with Aqumen adversarial testing

---

## ✅ Pre-Deployment Checklist

- [x] app.py updated with 7-source loading
- [x] benchmark_vector_db.py supports all sources
- [x] requirements.txt includes all dependencies
- [x] README.md explains the demo
- [x] Initial build optimized (<10 min)
- [x] Progressive loading implemented
- [x] Error handling for all datasets
- [x] Logging configured
- [x] Example prompts included
- [x] 20+ domains verified

---

## 🎉 Ready to Deploy!

Your demo is production-ready with:
- **32K+ questions** available
- **20 domains** covered
- **7 benchmark sources** integrated
- **Progressive loading** for fast startup
- **AI safety focus** (truthfulness, commonsense)

Just push to HuggingFace Spaces and you're ready to impress VCs! 🚀