Spaces:

kaburia
/

policy-analysis

Sleeping

App Files Files Community

policy-analysis / README_MODELS.md

kaburia

updated app

5d99375 3 months ago

preview code

raw

history blame contribute delete

4.51 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Policy Analysis Application - Model Pre-loading Setup

This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.

🚀 Quick Start

Option 1: Docker Deployment (Recommended)

# Clone the repository
git clone <your-repo-url>
cd policy-analysis

# Build and run with Docker
docker-compose up --build

Option 2: Manual Setup

# Install dependencies
pip install -r requirements.txt

# Download all models (one-time setup)
python download_models.py

# Test models are working
python test_models.py

# Start the application
python app.py

📦 What's New

Files Added:

download_models.py - Downloads all required ML models
test_models.py - Verifies all models are working correctly
startup.py - Startup script with automatic model downloading
Dockerfile - Docker configuration with model pre-caching
docker-compose.yml - Docker Compose setup
MODEL_SETUP.md - Detailed setup documentation

Files Modified:

app.py - Added model pre-loading functionality
requirements.txt - Added missing dependencies (numpy, requests)
utils/coherence_bbscore.py - Fixed default embedder parameter

🤖 Models Used

The application uses these ML models:

Model	Type	Size	Purpose
`sentence-transformers/all-MiniLM-L6-v2`	Embedding	~90MB	Text encoding
`BAAI/bge-m3`	Embedding	~2.3GB	Advanced text encoding
`cross-encoder/ms-marco-MiniLM-L-6-v2`	Cross-Encoder	~130MB	Document reranking
`MoritzLaurer/deberta-v3-base-zeroshot-v2.0`	Classification	~1.5GB	Sentiment analysis

Total download size: ~4GB

⚡ Performance Benefits

Before (without pre-loading):

First request: 30-60 seconds (model download + inference)
Subsequent requests: 2-5 seconds

After (with pre-loading):

First request: 2-5 seconds
Subsequent requests: 2-5 seconds

🔧 Configuration

Environment Variables:

PRELOAD_MODELS=true (default) - Pre-load models on app startup
PRELOAD_MODELS=false - Skip pre-loading (useful when models are cached)

Model Cache Location:

Linux/Mac: ~/.cache/huggingface/
Windows: %USERPROFILE%\.cache\huggingface\

🐳 Docker Deployment

The Dockerfile automatically downloads models during the build process:

# Downloads models and caches them in the image
RUN python download_models.py

This means:

✅ No download time during container startup
✅ Consistent performance across deployments
✅ Offline inference capability

🧪 Testing

Verify everything is working:

# Test all models
python test_models.py

# Expected output:
# 🧪 Model Verification Test Suite
# ✅ All tests passed! The application is ready to deploy.

📊 Resource Requirements

Minimum:

RAM: 8GB
Storage: 6GB (models + dependencies)
CPU: 2+ cores

🚨 Troubleshooting

Model Download Issues:

# Check connectivity
curl -I https://huggingface.co

# Check disk space
df -h

# Manual model test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"

Memory Issues:

Reduce model batch sizes
Use CPU-only inference: device=-1
Consider model quantization

Slow Performance:

Verify models are cached locally
Check if PRELOAD_MODELS=true
Monitor CPU/GPU usage

📈 Monitoring

Monitor these metrics in production:

Model loading time
Inference latency
Memory usage
Cache hit ratio

🔄 Updates

To update models:

# Clear cache
rm -rf ~/.cache/huggingface/

# Re-download
python download_models.py

# Test
python test_models.py

💡 Tips for Production

Use Docker: Models are cached in the image
Persistent Volumes: Mount model cache for faster rebuilds
Health Checks: Monitor model availability
Resource Limits: Set appropriate memory/CPU limits
Load Balancing: Use multiple instances for high traffic

🤝 Contributing

When adding new models:

Add model name to download_models.py
Add test case to test_models.py
Update documentation
Test thoroughly

For detailed setup instructions, see MODEL_SETUP.md.