policy-analysis / README_MODELS.md
kaburia's picture
updated app
5d99375

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Policy Analysis Application - Model Pre-loading Setup

This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.

πŸš€ Quick Start

Option 1: Docker Deployment (Recommended)

# Clone the repository
git clone <your-repo-url>
cd policy-analysis

# Build and run with Docker
docker-compose up --build

Option 2: Manual Setup

# Install dependencies
pip install -r requirements.txt

# Download all models (one-time setup)
python download_models.py

# Test models are working
python test_models.py

# Start the application
python app.py

πŸ“¦ What's New

Files Added:

  • download_models.py - Downloads all required ML models
  • test_models.py - Verifies all models are working correctly
  • startup.py - Startup script with automatic model downloading
  • Dockerfile - Docker configuration with model pre-caching
  • docker-compose.yml - Docker Compose setup
  • MODEL_SETUP.md - Detailed setup documentation

Files Modified:

  • app.py - Added model pre-loading functionality
  • requirements.txt - Added missing dependencies (numpy, requests)
  • utils/coherence_bbscore.py - Fixed default embedder parameter

πŸ€– Models Used

The application uses these ML models:

Model Type Size Purpose
sentence-transformers/all-MiniLM-L6-v2 Embedding ~90MB Text encoding
BAAI/bge-m3 Embedding ~2.3GB Advanced text encoding
cross-encoder/ms-marco-MiniLM-L-6-v2 Cross-Encoder ~130MB Document reranking
MoritzLaurer/deberta-v3-base-zeroshot-v2.0 Classification ~1.5GB Sentiment analysis

Total download size: ~4GB

⚑ Performance Benefits

Before (without pre-loading):

  • First request: 30-60 seconds (model download + inference)
  • Subsequent requests: 2-5 seconds

After (with pre-loading):

  • First request: 2-5 seconds
  • Subsequent requests: 2-5 seconds

πŸ”§ Configuration

Environment Variables:

  • PRELOAD_MODELS=true (default) - Pre-load models on app startup
  • PRELOAD_MODELS=false - Skip pre-loading (useful when models are cached)

Model Cache Location:

  • Linux/Mac: ~/.cache/huggingface/
  • Windows: %USERPROFILE%\.cache\huggingface\

🐳 Docker Deployment

The Dockerfile automatically downloads models during the build process:

# Downloads models and caches them in the image
RUN python download_models.py

This means:

  • βœ… No download time during container startup
  • βœ… Consistent performance across deployments
  • βœ… Offline inference capability

πŸ§ͺ Testing

Verify everything is working:

# Test all models
python test_models.py

# Expected output:
# πŸ§ͺ Model Verification Test Suite
# βœ… All tests passed! The application is ready to deploy.

πŸ“Š Resource Requirements

Minimum:

  • RAM: 8GB
  • Storage: 6GB (models + dependencies)
  • CPU: 2+ cores

Recommended:

  • RAM: 16GB
  • Storage: 10GB
  • CPU: 4+ cores
  • GPU: Optional (NVIDIA with CUDA support)

🚨 Troubleshooting

Model Download Issues:

# Check connectivity
curl -I https://huggingface.co

# Check disk space
df -h

# Manual model test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"

Memory Issues:

  • Reduce model batch sizes
  • Use CPU-only inference: device=-1
  • Consider model quantization

Slow Performance:

  • Verify models are cached locally
  • Check if PRELOAD_MODELS=true
  • Monitor CPU/GPU usage

πŸ“ˆ Monitoring

Monitor these metrics in production:

  • Model loading time
  • Inference latency
  • Memory usage
  • Cache hit ratio

πŸ”„ Updates

To update models:

# Clear cache
rm -rf ~/.cache/huggingface/

# Re-download
python download_models.py

# Test
python test_models.py

πŸ’‘ Tips for Production

  1. Use Docker: Models are cached in the image
  2. Persistent Volumes: Mount model cache for faster rebuilds
  3. Health Checks: Monitor model availability
  4. Resource Limits: Set appropriate memory/CPU limits
  5. Load Balancing: Use multiple instances for high traffic

🀝 Contributing

When adding new models:

  1. Add model name to download_models.py
  2. Add test case to test_models.py
  3. Update documentation
  4. Test thoroughly

For detailed setup instructions, see MODEL_SETUP.md.