---
title: Text Summarizer API
emoji: 📝
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---

# Text Summarizer API

A FastAPI-based text summarization service powered by Ollama and Mistral 7B model.

## 🚀 Features

- **Fast text summarization** using local LLM inference
- **RESTful API** with FastAPI
- **Health monitoring** and logging
- **Docker containerized** for easy deployment
- **Free deployment** on Hugging Face Spaces

## 📡 API Endpoints

### Health Check
```
GET /health
```

### V1 API (Ollama + Transformers Pipeline)
```
POST /api/v1/summarize
POST /api/v1/summarize/stream
POST /api/v1/summarize/pipeline/stream
```

### V2 API (HuggingFace Streaming)
```
POST /api/v2/summarize/stream
```

## 🌐 Live Deployment

**✅ Successfully deployed and tested on Hugging Face Spaces!**

- **Live Space:** https://colin730-SummarizerApp.hf.space
- **API Documentation:** https://colin730-SummarizerApp.hf.space/docs
- **Health Check:** https://colin730-SummarizerApp.hf.space/health
- **V2 Streaming API:** https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream

### Quick Test
```bash
# Test the live deployment
curl https://colin730-SummarizerApp.hf.space/health
curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
  -H "Content-Type: application/json" \
  -d '{"text":"This is a test of the live API.","max_tokens":50}'
```

**Request Format (V1 and V2 compatible):**
```json
{
  "text": "Your long text to summarize here...",
  "max_tokens": 256,
  "prompt": "Summarize the following text concisely:"
}
```

### API Documentation
- **Swagger UI**: `/docs`
- **ReDoc**: `/redoc`

## 🔧 Configuration

The service uses the following environment variables:

### V1 Configuration (Ollama)
- `OLLAMA_MODEL`: Model to use (default: `llama3.2:1b`)
- `OLLAMA_HOST`: Ollama service host (default: `http://localhost:11434`)
- `OLLAMA_TIMEOUT`: Request timeout in seconds (default: `60`)
- `ENABLE_V1_WARMUP`: Enable V1 warmup (default: `false`)

### V2 Configuration (HuggingFace)
- `HF_MODEL_ID`: HuggingFace model ID (default: `sshleifer/distilbart-cnn-6-6`)
- `HF_DEVICE_MAP`: Device mapping (default: `auto` for GPU fallback to CPU)
- `HF_TORCH_DTYPE`: Torch dtype (default: `auto`)
- `HF_HOME`: HuggingFace cache directory (default: `/tmp/huggingface`)
- `HF_MAX_NEW_TOKENS`: Max new tokens (default: `128`)
- `HF_TEMPERATURE`: Sampling temperature (default: `0.7`)
- `HF_TOP_P`: Nucleus sampling (default: `0.95`)
- `ENABLE_V2_WARMUP`: Enable V2 warmup (default: `true`)

### Server Configuration
- `SERVER_HOST`: Server host (default: `127.0.0.1`)
- `SERVER_PORT`: Server port (default: `8000`)
- `LOG_LEVEL`: Logging level (default: `INFO`)

## 🐳 Docker Deployment

### Local Development
```bash
# Build and run with docker-compose
docker-compose up --build

# Or run directly
docker build -f Dockerfile.hf -t summarizer-app .
docker run -p 7860:7860 summarizer-app
```

### Hugging Face Spaces
This app is optimized for deployment on Hugging Face Spaces using Docker SDK.

**V2-Only Deployment on HF Spaces:**
- Uses `t5-small` model (~250MB) for fast startup
- No Ollama dependency (saves memory and disk space)
- Model downloads during warmup for instant first request
- Optimized for free tier resource limits

**Environment Variables for HF Spaces:**
```bash
ENABLE_V1_WARMUP=false
ENABLE_V2_WARMUP=true
HF_MODEL_ID=sshleifer/distilbart-cnn-6-6
HF_HOME=/tmp/huggingface
```

## 📊 Performance

### V1 (Ollama + Transformers Pipeline)
- **V1 Models**: llama3.2:1b (Ollama) + distilbart-cnn-6-6 (Transformers)
- **Memory usage**: ~2-4GB RAM (when V1 warmup enabled)
- **Inference speed**: ~2-5 seconds per request
- **Startup time**: ~30-60 seconds (when V1 warmup enabled)

### V2 (HuggingFace Streaming) - Primary on HF Spaces
- **V2 Model**: sshleifer/distilbart-cnn-6-6 (~300MB download)
- **Memory usage**: ~500MB RAM (when V2 warmup enabled)
- **Inference speed**: Real-time token streaming
- **Startup time**: ~30-60 seconds (includes model download when V2 warmup enabled)

### Memory Optimization
- **V1 warmup disabled by default** (`ENABLE_V1_WARMUP=false`)
- **V2 warmup enabled by default** (`ENABLE_V2_WARMUP=true`)
- **HuggingFace Spaces**: V2-only deployment (no Ollama)
- **Local development**: V1 endpoints work if Ollama is running externally
- **distilbart-cnn-6-6 model**: Optimized for HuggingFace Spaces free tier with CNN/DailyMail fine-tuning

## 🛠️ Development

### Setup
```bash
# Install dependencies
pip install -r requirements.txt

# Run locally
uvicorn app.main:app --host 0.0.0.0 --port 7860
```

### Testing
```bash
# Run tests
pytest

# Run with coverage
pytest --cov=app
```

## 📝 Usage Examples

### V1 API (Ollama)
```python
import requests
import json

# V1 streaming summarization
response = requests.post(
    "https://colin730-SummarizerApp.hf.space/api/v1/summarize/stream",
    json={
        "text": "Your long article or text here...",
        "max_tokens": 256
    },
    stream=True
)

for line in response.iter_lines():
    if line.startswith(b'data: '):
        data = json.loads(line[6:])
        print(data["content"], end="")
        if data["done"]:
            break
```

### V2 API (HuggingFace Streaming) - Recommended
```python
import requests
import json

# V2 streaming summarization (same request format as V1)
response = requests.post(
    "https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream",
    json={
        "text": "Your long article or text here...",
        "max_tokens": 128  # V2 uses max_new_tokens
    },
    stream=True
)

for line in response.iter_lines():
    if line.startswith(b'data: '):
        data = json.loads(line[6:])
        print(data["content"], end="")
        if data["done"]:
            break
```

### Android Client (SSE)
```kotlin
// Android SSE client example
val client = OkHttpClient()
val request = Request.Builder()
    .url("https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream")
    .post(RequestBody.create(
        MediaType.parse("application/json"),
        """{"text": "Your text...", "max_tokens": 128}"""
    ))
    .build()

client.newCall(request).enqueue(object : Callback {
    override fun onResponse(call: Call, response: Response) {
        val source = response.body()?.source()
        source?.use { bufferedSource ->
            while (true) {
                val line = bufferedSource.readUtf8Line()
                if (line?.startsWith("data: ") == true) {
                    val json = line.substring(6)
                    val data = Gson().fromJson(json, Map::class.java)
                    // Update UI with data["content"]
                    if (data["done"] == true) break
                }
            }
        }
    }
})
```

### cURL Examples
```bash
# Test live deployment
curl https://colin730-SummarizerApp.hf.space/health

# V1 API (if Ollama is available)
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v1/summarize/stream" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text...", "max_tokens": 256}'

# V2 API (HuggingFace streaming - recommended)
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text...", "max_tokens": 128}'
```

### Test Script
```bash
# Use the included test script
./scripts/test_endpoints.sh https://colin730-SummarizerApp.hf.space
```

## 🔒 Security

- Non-root user execution
- Input validation and sanitization
- Rate limiting (configurable)
- API key authentication (optional)

## 📈 Monitoring

The service includes:
- Health check endpoint
- Request logging
- Error tracking
- Performance metrics

## 🆘 Troubleshooting

### Common Issues

1. **Model not loading**: Check if Ollama is running and model is pulled
2. **Out of memory**: Ensure sufficient RAM (8GB+) for Mistral 7B
3. **Slow startup**: Normal on first run due to model download
4. **API errors**: Check logs via `/docs` endpoint

### Logs
View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.

## 📄 License

MIT License - see LICENSE file for details.

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

---

## ✅ Deployment Status

**Successfully deployed and tested on Hugging Face Spaces!** 🚀

- ✅ **Proxy-aware FastAPI** with `root_path` support
- ✅ **All endpoints working** (health, docs, V2 API)
- ✅ **Real-time streaming** summarization
- ✅ **No 404 errors** - all paths correctly configured
- ✅ **Test script included** for easy verification

### Recent Fixes Applied
- Added `root_path=os.getenv("HF_SPACE_ROOT_PATH", "")` for HF Spaces proxy awareness
- Ensured binding to `0.0.0.0:7860` as required by HF Spaces
- Verified V2 router paths (`/api/v2/summarize/stream`) with no double prefixes
- Created test script for external endpoint verification

**Live Space:** https://colin730-SummarizerApp.hf.space 🎯