SummarizerApp / README.md
ming
Add support for sshleifer/distilbart-cnn-6-6 model for V2 API
6e48ad3
metadata
title: Text Summarizer API
emoji: πŸ“
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860

Text Summarizer API

A FastAPI-based text summarization service powered by Ollama and Mistral 7B model.

πŸš€ Features

  • Fast text summarization using local LLM inference
  • RESTful API with FastAPI
  • Health monitoring and logging
  • Docker containerized for easy deployment
  • Free deployment on Hugging Face Spaces

πŸ“‘ API Endpoints

Health Check

GET /health

V1 API (Ollama + Transformers Pipeline)

POST /api/v1/summarize
POST /api/v1/summarize/stream
POST /api/v1/summarize/pipeline/stream

V2 API (HuggingFace Streaming)

POST /api/v2/summarize/stream

🌐 Live Deployment

βœ… Successfully deployed and tested on Hugging Face Spaces!

Quick Test

# Test the live deployment
curl https://colin730-SummarizerApp.hf.space/health
curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
  -H "Content-Type: application/json" \
  -d '{"text":"This is a test of the live API.","max_tokens":50}'

Request Format (V1 and V2 compatible):

{
  "text": "Your long text to summarize here...",
  "max_tokens": 256,
  "prompt": "Summarize the following text concisely:"
}

API Documentation

  • Swagger UI: /docs
  • ReDoc: /redoc

πŸ”§ Configuration

The service uses the following environment variables:

V1 Configuration (Ollama)

  • OLLAMA_MODEL: Model to use (default: llama3.2:1b)
  • OLLAMA_HOST: Ollama service host (default: http://localhost:11434)
  • OLLAMA_TIMEOUT: Request timeout in seconds (default: 60)
  • ENABLE_V1_WARMUP: Enable V1 warmup (default: false)

V2 Configuration (HuggingFace)

  • HF_MODEL_ID: HuggingFace model ID (default: sshleifer/distilbart-cnn-6-6)
  • HF_DEVICE_MAP: Device mapping (default: auto for GPU fallback to CPU)
  • HF_TORCH_DTYPE: Torch dtype (default: auto)
  • HF_HOME: HuggingFace cache directory (default: /tmp/huggingface)
  • HF_MAX_NEW_TOKENS: Max new tokens (default: 128)
  • HF_TEMPERATURE: Sampling temperature (default: 0.7)
  • HF_TOP_P: Nucleus sampling (default: 0.95)
  • ENABLE_V2_WARMUP: Enable V2 warmup (default: true)

Server Configuration

  • SERVER_HOST: Server host (default: 127.0.0.1)
  • SERVER_PORT: Server port (default: 8000)
  • LOG_LEVEL: Logging level (default: INFO)

🐳 Docker Deployment

Local Development

# Build and run with docker-compose
docker-compose up --build

# Or run directly
docker build -f Dockerfile.hf -t summarizer-app .
docker run -p 7860:7860 summarizer-app

Hugging Face Spaces

This app is optimized for deployment on Hugging Face Spaces using Docker SDK.

V2-Only Deployment on HF Spaces:

  • Uses t5-small model (~250MB) for fast startup
  • No Ollama dependency (saves memory and disk space)
  • Model downloads during warmup for instant first request
  • Optimized for free tier resource limits

Environment Variables for HF Spaces:

ENABLE_V1_WARMUP=false
ENABLE_V2_WARMUP=true
HF_MODEL_ID=sshleifer/distilbart-cnn-6-6
HF_HOME=/tmp/huggingface

πŸ“Š Performance

V1 (Ollama + Transformers Pipeline)

  • V1 Models: llama3.2:1b (Ollama) + distilbart-cnn-6-6 (Transformers)
  • Memory usage: ~2-4GB RAM (when V1 warmup enabled)
  • Inference speed: ~2-5 seconds per request
  • Startup time: ~30-60 seconds (when V1 warmup enabled)

V2 (HuggingFace Streaming) - Primary on HF Spaces

  • V2 Model: sshleifer/distilbart-cnn-6-6 (~300MB download)
  • Memory usage: ~500MB RAM (when V2 warmup enabled)
  • Inference speed: Real-time token streaming
  • Startup time: ~30-60 seconds (includes model download when V2 warmup enabled)

Memory Optimization

  • V1 warmup disabled by default (ENABLE_V1_WARMUP=false)
  • V2 warmup enabled by default (ENABLE_V2_WARMUP=true)
  • HuggingFace Spaces: V2-only deployment (no Ollama)
  • Local development: V1 endpoints work if Ollama is running externally
  • distilbart-cnn-6-6 model: Optimized for HuggingFace Spaces free tier with CNN/DailyMail fine-tuning

πŸ› οΈ Development

Setup

# Install dependencies
pip install -r requirements.txt

# Run locally
uvicorn app.main:app --host 0.0.0.0 --port 7860

Testing

# Run tests
pytest

# Run with coverage
pytest --cov=app

πŸ“ Usage Examples

V1 API (Ollama)

import requests
import json

# V1 streaming summarization
response = requests.post(
    "https://colin730-SummarizerApp.hf.space/api/v1/summarize/stream",
    json={
        "text": "Your long article or text here...",
        "max_tokens": 256
    },
    stream=True
)

for line in response.iter_lines():
    if line.startswith(b'data: '):
        data = json.loads(line[6:])
        print(data["content"], end="")
        if data["done"]:
            break

V2 API (HuggingFace Streaming) - Recommended

import requests
import json

# V2 streaming summarization (same request format as V1)
response = requests.post(
    "https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream",
    json={
        "text": "Your long article or text here...",
        "max_tokens": 128  # V2 uses max_new_tokens
    },
    stream=True
)

for line in response.iter_lines():
    if line.startswith(b'data: '):
        data = json.loads(line[6:])
        print(data["content"], end="")
        if data["done"]:
            break

Android Client (SSE)

// Android SSE client example
val client = OkHttpClient()
val request = Request.Builder()
    .url("https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream")
    .post(RequestBody.create(
        MediaType.parse("application/json"),
        """{"text": "Your text...", "max_tokens": 128}"""
    ))
    .build()

client.newCall(request).enqueue(object : Callback {
    override fun onResponse(call: Call, response: Response) {
        val source = response.body()?.source()
        source?.use { bufferedSource ->
            while (true) {
                val line = bufferedSource.readUtf8Line()
                if (line?.startsWith("data: ") == true) {
                    val json = line.substring(6)
                    val data = Gson().fromJson(json, Map::class.java)
                    // Update UI with data["content"]
                    if (data["done"] == true) break
                }
            }
        }
    }
})

cURL Examples

# Test live deployment
curl https://colin730-SummarizerApp.hf.space/health

# V1 API (if Ollama is available)
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v1/summarize/stream" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text...", "max_tokens": 256}'

# V2 API (HuggingFace streaming - recommended)
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text...", "max_tokens": 128}'

Test Script

# Use the included test script
./scripts/test_endpoints.sh https://colin730-SummarizerApp.hf.space

πŸ”’ Security

  • Non-root user execution
  • Input validation and sanitization
  • Rate limiting (configurable)
  • API key authentication (optional)

πŸ“ˆ Monitoring

The service includes:

  • Health check endpoint
  • Request logging
  • Error tracking
  • Performance metrics

πŸ†˜ Troubleshooting

Common Issues

  1. Model not loading: Check if Ollama is running and model is pulled
  2. Out of memory: Ensure sufficient RAM (8GB+) for Mistral 7B
  3. Slow startup: Normal on first run due to model download
  4. API errors: Check logs via /docs endpoint

Logs

View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.

πŸ“„ License

MIT License - see LICENSE file for details.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

βœ… Deployment Status

Successfully deployed and tested on Hugging Face Spaces! πŸš€

  • βœ… Proxy-aware FastAPI with root_path support
  • βœ… All endpoints working (health, docs, V2 API)
  • βœ… Real-time streaming summarization
  • βœ… No 404 errors - all paths correctly configured
  • βœ… Test script included for easy verification

Recent Fixes Applied

  • Added root_path=os.getenv("HF_SPACE_ROOT_PATH", "") for HF Spaces proxy awareness
  • Ensured binding to 0.0.0.0:7860 as required by HF Spaces
  • Verified V2 router paths (/api/v2/summarize/stream) with no double prefixes
  • Created test script for external endpoint verification

Live Space: https://colin730-SummarizerApp.hf.space 🎯