Spaces:

colin730
/

SummarizerApp

Running

App Files Files Community

SummarizerApp / README_HF.md

ming

Fix Ollama permissions and model configuration for Hugging Face deployment

bd3417a 27 days ago

preview code

raw

history blame contribute delete

3.43 kB

metadata

title: Text Summarizer API
emoji: 📝
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860

Text Summarizer API

A FastAPI-based text summarization service powered by Ollama and Mistral 7B model.

🚀 Features

Fast text summarization using local LLM inference
RESTful API with FastAPI
Health monitoring and logging
Docker containerized for easy deployment
Free deployment on Hugging Face Spaces

📡 API Endpoints

Health Check

GET /health

Summarize Text

POST /api/v1/summarize
Content-Type: application/json

{
  "text": "Your long text to summarize here...",
  "max_tokens": 256,
  "temperature": 0.7
}

API Documentation

Swagger UI: /docs
ReDoc: /redoc

🔧 Configuration

The service uses the following environment variables:

OLLAMA_MODEL: Model to use (default: mistral:7b)
OLLAMA_HOST: Ollama service host (default: http://localhost:11434)
OLLAMA_TIMEOUT: Request timeout in seconds (default: 30)
SERVER_HOST: Server host (default: 0.0.0.0)
SERVER_PORT: Server port (default: 7860)
LOG_LEVEL: Logging level (default: INFO)

🐳 Docker Deployment

Local Development

# Build and run with docker-compose
docker-compose up --build

# Or run directly
docker build -f Dockerfile.hf -t summarizer-app .
docker run -p 7860:7860 summarizer-app

Hugging Face Spaces

This app is configured for deployment on Hugging Face Spaces using Docker SDK.

📊 Performance

Model: Mistral 7B (7GB RAM requirement)
Startup time: ~2-3 minutes (includes model download)
Inference speed: ~2-5 seconds per request
Memory usage: ~8GB RAM

🛠️ Development

Setup

# Install dependencies
pip install -r requirements.txt

# Run locally
uvicorn app.main:app --host 0.0.0.0 --port 7860

Testing

# Run tests
pytest

# Run with coverage
pytest --cov=app

📝 Usage Examples

Python

import requests

# Summarize text
response = requests.post(
    "https://your-space.hf.space/api/v1/summarize",
    json={
        "text": "Your long article or text here...",
        "max_tokens": 256
    }
)

result = response.json()
print(result["summary"])

cURL

curl -X POST "https://your-space.hf.space/api/v1/summarize" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text to summarize...",
    "max_tokens": 256
  }'

🔒 Security

Non-root user execution
Input validation and sanitization
Rate limiting (configurable)
API key authentication (optional)

📈 Monitoring

The service includes:

Health check endpoint
Request logging
Error tracking
Performance metrics

🆘 Troubleshooting

Common Issues

Model not loading: Check if Ollama is running and model is pulled
Out of memory: Ensure sufficient RAM (8GB+) for Mistral 7B
Slow startup: Normal on first run due to model download
API errors: Check logs via /docs endpoint

Logs

View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.

📄 License

MIT License - see LICENSE file for details.

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Deployed on Hugging Face Spaces 🚀