SummarizerApp / README_HF.md
ming
Fix Ollama permissions and model configuration for Hugging Face deployment
bd3417a
metadata
title: Text Summarizer API
emoji: πŸ“
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860

Text Summarizer API

A FastAPI-based text summarization service powered by Ollama and Mistral 7B model.

πŸš€ Features

  • Fast text summarization using local LLM inference
  • RESTful API with FastAPI
  • Health monitoring and logging
  • Docker containerized for easy deployment
  • Free deployment on Hugging Face Spaces

πŸ“‘ API Endpoints

Health Check

GET /health

Summarize Text

POST /api/v1/summarize
Content-Type: application/json

{
  "text": "Your long text to summarize here...",
  "max_tokens": 256,
  "temperature": 0.7
}

API Documentation

  • Swagger UI: /docs
  • ReDoc: /redoc

πŸ”§ Configuration

The service uses the following environment variables:

  • OLLAMA_MODEL: Model to use (default: mistral:7b)
  • OLLAMA_HOST: Ollama service host (default: http://localhost:11434)
  • OLLAMA_TIMEOUT: Request timeout in seconds (default: 30)
  • SERVER_HOST: Server host (default: 0.0.0.0)
  • SERVER_PORT: Server port (default: 7860)
  • LOG_LEVEL: Logging level (default: INFO)

🐳 Docker Deployment

Local Development

# Build and run with docker-compose
docker-compose up --build

# Or run directly
docker build -f Dockerfile.hf -t summarizer-app .
docker run -p 7860:7860 summarizer-app

Hugging Face Spaces

This app is configured for deployment on Hugging Face Spaces using Docker SDK.

πŸ“Š Performance

  • Model: Mistral 7B (7GB RAM requirement)
  • Startup time: ~2-3 minutes (includes model download)
  • Inference speed: ~2-5 seconds per request
  • Memory usage: ~8GB RAM

πŸ› οΈ Development

Setup

# Install dependencies
pip install -r requirements.txt

# Run locally
uvicorn app.main:app --host 0.0.0.0 --port 7860

Testing

# Run tests
pytest

# Run with coverage
pytest --cov=app

πŸ“ Usage Examples

Python

import requests

# Summarize text
response = requests.post(
    "https://your-space.hf.space/api/v1/summarize",
    json={
        "text": "Your long article or text here...",
        "max_tokens": 256
    }
)

result = response.json()
print(result["summary"])

cURL

curl -X POST "https://your-space.hf.space/api/v1/summarize" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text to summarize...",
    "max_tokens": 256
  }'

πŸ”’ Security

  • Non-root user execution
  • Input validation and sanitization
  • Rate limiting (configurable)
  • API key authentication (optional)

πŸ“ˆ Monitoring

The service includes:

  • Health check endpoint
  • Request logging
  • Error tracking
  • Performance metrics

πŸ†˜ Troubleshooting

Common Issues

  1. Model not loading: Check if Ollama is running and model is pulled
  2. Out of memory: Ensure sufficient RAM (8GB+) for Mistral 7B
  3. Slow startup: Normal on first run due to model download
  4. API errors: Check logs via /docs endpoint

Logs

View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.

πŸ“„ License

MIT License - see LICENSE file for details.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Deployed on Hugging Face Spaces πŸš€