Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.49.1
Policy Analysis Application - Model Pre-loading Setup
This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.
π Quick Start
Option 1: Docker Deployment (Recommended)
# Clone the repository
git clone <your-repo-url>
cd policy-analysis
# Build and run with Docker
docker-compose up --build
Option 2: Manual Setup
# Install dependencies
pip install -r requirements.txt
# Download all models (one-time setup)
python download_models.py
# Test models are working
python test_models.py
# Start the application
python app.py
π¦ What's New
Files Added:
download_models.py- Downloads all required ML modelstest_models.py- Verifies all models are working correctlystartup.py- Startup script with automatic model downloadingDockerfile- Docker configuration with model pre-cachingdocker-compose.yml- Docker Compose setupMODEL_SETUP.md- Detailed setup documentation
Files Modified:
app.py- Added model pre-loading functionalityrequirements.txt- Added missing dependencies (numpy, requests)utils/coherence_bbscore.py- Fixed default embedder parameter
π€ Models Used
The application uses these ML models:
| Model | Type | Size | Purpose |
|---|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 |
Embedding | ~90MB | Text encoding |
BAAI/bge-m3 |
Embedding | ~2.3GB | Advanced text encoding |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
Cross-Encoder | ~130MB | Document reranking |
MoritzLaurer/deberta-v3-base-zeroshot-v2.0 |
Classification | ~1.5GB | Sentiment analysis |
Total download size: ~4GB
β‘ Performance Benefits
Before (without pre-loading):
- First request: 30-60 seconds (model download + inference)
- Subsequent requests: 2-5 seconds
After (with pre-loading):
- First request: 2-5 seconds
- Subsequent requests: 2-5 seconds
π§ Configuration
Environment Variables:
PRELOAD_MODELS=true(default) - Pre-load models on app startupPRELOAD_MODELS=false- Skip pre-loading (useful when models are cached)
Model Cache Location:
- Linux/Mac:
~/.cache/huggingface/ - Windows:
%USERPROFILE%\.cache\huggingface\
π³ Docker Deployment
The Dockerfile automatically downloads models during the build process:
# Downloads models and caches them in the image
RUN python download_models.py
This means:
- β No download time during container startup
- β Consistent performance across deployments
- β Offline inference capability
π§ͺ Testing
Verify everything is working:
# Test all models
python test_models.py
# Expected output:
# π§ͺ Model Verification Test Suite
# β
All tests passed! The application is ready to deploy.
π Resource Requirements
Minimum:
- RAM: 8GB
- Storage: 6GB (models + dependencies)
- CPU: 2+ cores
Recommended:
- RAM: 16GB
- Storage: 10GB
- CPU: 4+ cores
- GPU: Optional (NVIDIA with CUDA support)
π¨ Troubleshooting
Model Download Issues:
# Check connectivity
curl -I https://huggingface.co
# Check disk space
df -h
# Manual model test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
Memory Issues:
- Reduce model batch sizes
- Use CPU-only inference:
device=-1 - Consider model quantization
Slow Performance:
- Verify models are cached locally
- Check if
PRELOAD_MODELS=true - Monitor CPU/GPU usage
π Monitoring
Monitor these metrics in production:
- Model loading time
- Inference latency
- Memory usage
- Cache hit ratio
π Updates
To update models:
# Clear cache
rm -rf ~/.cache/huggingface/
# Re-download
python download_models.py
# Test
python test_models.py
π‘ Tips for Production
- Use Docker: Models are cached in the image
- Persistent Volumes: Mount model cache for faster rebuilds
- Health Checks: Monitor model availability
- Resource Limits: Set appropriate memory/CPU limits
- Load Balancing: Use multiple instances for high traffic
π€ Contributing
When adding new models:
- Add model name to
download_models.py - Add test case to
test_models.py - Update documentation
- Test thoroughly
For detailed setup instructions, see MODEL_SETUP.md.