Spaces:
Running
Running
Text Summarizer Backend API
A FastAPI-based backend service for text summarization using Ollama's local language models. Designed for Android app integration with cloud deployment capabilities.
Features
- π FastAPI - Modern, fast web framework for building APIs
- π€ Ollama Integration - Local LLM inference with privacy-first approach
- π± Android Ready - RESTful API optimized for mobile consumption
- π Request Tracking - Unique request IDs and structured logging
- β Comprehensive Testing - 30+ tests with >90% coverage
- π³ Docker Ready - Containerized deployment support
- βοΈ Cloud Extensible - Easy migration to cloud hosting
Quick Start
Prerequisites
- Python 3.7+
- Ollama installed and running
- A compatible language model (e.g.,
llama3.1:8b)
Installation
Clone the repository
git clone https://github.com/MingLu0/SummarizerBackend.git cd SummarizerBackendSet up Ollama
# Install Ollama (macOS) brew install ollama # Start Ollama service ollama serve # Pull a model (in another terminal) ollama pull llama3.1:8bSet up Python environment
# Create virtual environment python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install dependencies pip install -r requirements.txtRun the API
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reloadTest the API
# Health check curl http://127.0.0.1:8000/health # Summarize text curl -X POST http://127.0.0.1:8000/api/v1/summarize/ \ -H "Content-Type: application/json" \ -d '{"text": "Your long text to summarize here..."}'
API Documentation
Interactive Docs
- Swagger UI: http://127.0.0.1:8000/docs
- ReDoc: http://127.0.0.1:8000/redoc
Endpoints
GET /health
Health check endpoint.
Response:
{
"status": "ok",
"service": "text-summarizer-api",
"version": "1.0.0"
}
POST /api/v1/summarize/
Summarize text using Ollama.
Request:
{
"text": "Your text to summarize...",
"max_tokens": 256,
"prompt": "Summarize the following text concisely:"
}
Response:
{
"summary": "Generated summary text",
"model": "llama3.1:8b",
"tokens_used": 150,
"latency_ms": 1234.5
}
Error Response:
{
"detail": "Summarization failed: Connection error",
"code": "OLLAMA_ERROR",
"request_id": "req-12345"
}
Configuration
Configure the API using environment variables:
# Ollama Configuration
export OLLAMA_MODEL=llama3.1:8b
export OLLAMA_HOST=http://127.0.0.1:11434
export OLLAMA_TIMEOUT=30
# Server Configuration
export SERVER_HOST=127.0.0.1
export SERVER_PORT=8000
export LOG_LEVEL=INFO
# Optional: API Security
export API_KEY_ENABLED=false
export API_KEY=your-secret-key
# Optional: Rate Limiting
export RATE_LIMIT_ENABLED=false
export RATE_LIMIT_REQUESTS=60
export RATE_LIMIT_WINDOW=60
Android Integration
Retrofit Example
// API Interface
interface SummarizerApi {
@POST("api/v1/summarize/")
suspend fun summarize(@Body request: SummarizeRequest): SummarizeResponse
}
// Data Classes
data class SummarizeRequest(
val text: String,
val max_tokens: Int = 256,
val prompt: String = "Summarize the following text concisely:"
)
data class SummarizeResponse(
val summary: String,
val model: String,
val tokens_used: Int?,
val latency_ms: Double?
)
// Usage
val retrofit = Retrofit.Builder()
.baseUrl("http://127.0.0.1:8000/")
.addConverterFactory(GsonConverterFactory.create())
.build()
val api = retrofit.create(SummarizerApi::class.java)
val response = api.summarize(SummarizeRequest(text = "Your text here"))
OkHttp Example
val client = OkHttpClient()
val json = JSONObject().apply {
put("text", "Your text to summarize")
put("max_tokens", 256)
}
val request = Request.Builder()
.url("http://127.0.0.1:8000/api/v1/summarize/")
.post(json.toString().toRequestBody("application/json".toMediaType()))
.build()
client.newCall(request).execute().use { response ->
val result = response.body?.string()
// Handle response
}
Development
Running Tests
# Run all tests locally
pytest
# Run with coverage
pytest --cov=app --cov-report=html --cov-report=term
# Run tests in Docker
./scripts/run-tests.sh
# Run specific test file
pytest tests/test_api.py -v
# Run tests and stop on first failure
pytest -x
Code Quality
# Format code
black app/ tests/
# Sort imports
isort app/ tests/
# Lint code
flake8 app/ tests/
Project Structure
app/
βββ main.py # FastAPI app entry point
βββ api/
β βββ v1/
β βββ routes.py # API route definitions
β βββ schemas.py # Pydantic models
β βββ summarize.py # Summarization endpoint
βββ services/
β βββ summarizer.py # Ollama integration
βββ core/
βββ config.py # Configuration management
βββ logging.py # Logging setup
βββ middleware.py # Request middleware
βββ errors.py # Error handling
tests/
βββ test_api.py # API endpoint tests
βββ test_services.py # Service layer tests
βββ test_schemas.py # Pydantic model tests
βββ test_config.py # Configuration tests
βββ conftest.py # Test configuration
Docker Deployment
Quick Start with Docker
# 1. Start Ollama service
docker-compose up ollama -d
# 2. Download a model (first time only)
./scripts/setup-ollama.sh llama3.1:8b
# 3. Start the API
docker-compose up api -d
# 4. Test the setup
curl http://localhost:8000/health
Development with Hot Reload
# Use development compose file
docker-compose -f docker-compose.dev.yml up --build
Production with Nginx
# Start with Nginx reverse proxy
docker-compose --profile production up --build
Manual Build
# Build the image
docker build -t summarizer-backend .
# Run with Ollama
docker run -p 8000:8000 \
-e OLLAMA_HOST=http://host.docker.internal:11434 \
summarizer-backend
Production Deployment
Build the image
docker build -t your-registry/summarizer-backend:latest .Deploy to cloud
# Push to registry docker push your-registry/summarizer-backend:latest # Deploy to your cloud provider # (AWS ECS, Google Cloud Run, Azure Container Instances, etc.)
Cloud Deployment Options
π Quick Deploy with Railway (Recommended)
# 1. Install Railway CLI
npm install -g @railway/cli
# 2. Login and deploy
railway login
railway init
railway up
Railway Advantages:
- β Supports Docker Compose with Ollama
- β Persistent volumes for models
- β Automatic HTTPS
- β Easy environment management
π Other Options
- Google Cloud Run: Serverless with auto-scaling
- AWS ECS: Full container orchestration
- DigitalOcean App Platform: Simple deployment
- Render: GitHub integration
π Detailed Deployment Guide
See DEPLOYMENT.md for comprehensive deployment instructions for all platforms.
β οΈ Important Notes
- Memory Requirements: llama3.1:8b needs ~8GB RAM
- Model Download: Models are downloaded after deployment
- Cost Optimization: Start with smaller models (mistral:7b)
- Security: Enable API keys for production use
Monitoring and Logging
Request Tracking
Every request gets a unique ID for tracking:
curl -H "X-Request-ID: my-custom-id" http://127.0.0.1:8000/api/v1/summarize/ \
-d '{"text": "test"}'
Log Format
2025-09-29 20:47:46,949 - app.core.middleware - INFO - Request abc123: POST /api/v1/summarize/
2025-09-29 20:47:46,987 - app.core.middleware - INFO - Response abc123: 200 (38.48ms)
Performance Considerations
Model Selection
- llama3.1:8b - Good balance of speed and quality
- mistral:7b - Faster, good for real-time apps
- llama3.1:70b - Higher quality, slower inference
Optimization Tips
- Batch requests when possible
- Cache summaries for repeated content
- Use appropriate max_tokens (256-512 for most use cases)
- Monitor latency and adjust timeout settings
Troubleshooting
Common Issues
Ollama connection failed
# Check if Ollama is running
curl http://127.0.0.1:11434/api/tags
# Restart Ollama
ollama serve
Model not found
# List available models
ollama list
# Pull the required model
ollama pull llama3.1:8b
Port already in use
# Use a different port
uvicorn app.main:app --port 8001
Debug Mode
# Enable debug logging
export LOG_LEVEL=DEBUG
uvicorn app.main:app --reload
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- π§ Email: [email protected]
- π Issues: GitHub Issues
- π Documentation: API Docs
Built with β€οΈ for privacy-first text summarization