SummarizerApp / BACKEND_PLAN.md
ming
chore: initialize FastAPI backend project structure and testing setup
9024ad9
|
raw
history blame
7.78 kB

Text Summarizer Backend - Development Plan

Overview

A minimal FastAPI backend for text summarization using local Ollama, designed to be callable from an Android app and extensible for cloud hosting.

Architecture Goals

  • Local-first: Use Ollama running locally for privacy and cost control
  • Cloud-ready: Structure code to easily deploy to cloud later
  • Minimal v1: Focus on core summarization functionality
  • Android-friendly: RESTful API optimized for mobile app consumption

Technology Stack

  • Backend: FastAPI + Python
  • LLM: Ollama (local)
  • Server: Uvicorn
  • Validation: Pydantic
  • Testing: Pytest + pytest-asyncio + httpx (for async testing)
  • Containerization: Docker (for cloud deployment)

Project Structure

app/
β”œβ”€β”€ main.py                 # FastAPI app entry point
β”œβ”€β”€ api/
β”‚   └── v1/
β”‚       β”œβ”€β”€ routes.py       # API route definitions
β”‚       └── schemas.py      # Pydantic models
β”œβ”€β”€ services/
β”‚   └── summarizer.py       # Ollama integration
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ config.py          # Configuration management
β”‚   └── logging.py         # Logging setup
tests/
β”œβ”€β”€ test_api.py            # API endpoint tests
β”œβ”€β”€ test_services.py       # Service layer tests
β”œβ”€β”€ test_schemas.py        # Pydantic model tests
β”œβ”€β”€ test_config.py         # Configuration tests
└── conftest.py           # Test configuration and fixtures
requirements.txt
Dockerfile
docker-compose.yml
README.md

API Contract (v1)

POST /api/v1/summarize

Request:

{
  "text": "string (required)",
  "max_tokens": 256,
  "prompt": "Summarize concisely."
}

Response:

{
  "summary": "string",
  "model": "llama3.1:8b",
  "tokens_used": 512,
  "latency_ms": 1234
}

GET /health

Response:

{
  "status": "ok",
  "ollama": "reachable"
}

Development Phases

Phase 1: Foundation

  • Project scaffold and directory structure
  • Core dependencies and requirements.txt (including test dependencies)
  • Basic FastAPI app setup
  • Configuration management with environment variables
  • Logging setup
  • Health check endpoint
  • Basic test setup and configuration

Phase 2: Core Feature

  • Pydantic schemas for request/response
  • Unit tests for schemas (validation, serialization)
  • Ollama service integration
  • Unit tests for Ollama service (mocked)
  • Summarization endpoint implementation
  • Integration tests for API endpoints
  • Input validation and error handling
  • Basic request/response logging

Phase 3: Quality & DX

  • Error handling middleware
  • Request ID middleware
  • Input size limits and validation
  • Rate limiting (optional for v1)
  • Test coverage analysis and improvement
  • Performance tests for summarization endpoint

Phase 4: Cloud-Ready Structure

  • Dockerfile for containerization
  • docker-compose.yml for local development
  • Environment-based configuration
  • CORS configuration for Android app
  • Security headers and API key support (optional)
  • Metrics endpoint (optional)

Phase 5: Documentation & Examples

  • Comprehensive README with setup instructions
  • API documentation (FastAPI auto-docs)
  • Example curl commands
  • Android client integration examples
  • Deployment guide for cloud hosting

Configuration

Environment Variables

# Ollama Configuration
OLLAMA_MODEL=llama3.1:8b
OLLAMA_HOST=http://127.0.0.1:11434
OLLAMA_TIMEOUT=30

# Server Configuration
SERVER_HOST=127.0.0.1
SERVER_PORT=8000
LOG_LEVEL=INFO

# Optional: API Security
API_KEY_ENABLED=false
API_KEY=your-secret-key

# Optional: Rate Limiting
RATE_LIMIT_ENABLED=false
RATE_LIMIT_REQUESTS=60
RATE_LIMIT_WINDOW=60

Local Development Setup

Prerequisites

  1. Install Ollama:

    # macOS
    brew install ollama
    
    # Or download from https://ollama.ai
    
  2. Start Ollama service:

    ollama serve
    
  3. Pull a model:

    ollama pull llama3.1:8b
    # or
    ollama pull mistral
    

Running the API

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export OLLAMA_MODEL=llama3.1:8b

# Run the server
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

Testing the API

# Health check
curl http://127.0.0.1:8000/health

# Summarize text
curl -X POST http://127.0.0.1:8000/api/v1/summarize \
  -H "Content-Type: application/json" \
  -d '{"text": "Your long text to summarize here..."}'

Running Tests

# Run all tests
pytest

# Run tests with coverage
pytest --cov=app --cov-report=html --cov-report=term

# Run specific test file
pytest tests/test_api.py

# Run tests with verbose output
pytest -v

# Run tests and stop on first failure
pytest -x

Testing Strategy

Test Types

  1. Unit Tests

    • Pydantic model validation
    • Service layer logic (with mocked Ollama)
    • Configuration loading
    • Utility functions
  2. Integration Tests

    • API endpoint testing with TestClient
    • End-to-end summarization flow
    • Error handling scenarios
    • Health check functionality
  3. Mock Strategy

    • Mock Ollama HTTP calls using httpx or responses
    • Mock external dependencies
    • Use fixtures for common test data

Test Coverage Goals

  • Minimum 90% code coverage
  • 100% coverage for critical paths (API endpoints, error handling)
  • All edge cases tested (empty input, large input, network failures)

Test Data

# Example test fixtures
SAMPLE_TEXT = "This is a long text that needs to be summarized..."
SAMPLE_SUMMARY = "This text discusses summarization."
MOCK_OLLAMA_RESPONSE = {
    "model": "llama3.1:8b",
    "response": SAMPLE_SUMMARY,
    "done": True
}

Continuous Testing

  • Tests run on every code change
  • Pre-commit hooks for test execution
  • CI/CD pipeline integration ready

Android Integration

Example Android HTTP Client

// Using Retrofit or OkHttp
data class SummarizeRequest(
    val text: String,
    val max_tokens: Int = 256,
    val prompt: String = "Summarize concisely."
)

data class SummarizeResponse(
    val summary: String,
    val model: String,
    val tokens_used: Int,
    val latency_ms: Int
)

// API call
@POST("api/v1/summarize")
suspend fun summarize(@Body request: SummarizeRequest): SummarizeResponse

Cloud Deployment Considerations

Future Extensions

  • Authentication: API key or OAuth2
  • Rate Limiting: Redis-based distributed rate limiting
  • Monitoring: Prometheus metrics, health checks
  • Scaling: Multiple replicas, load balancing
  • Database: Usage tracking, user management
  • Caching: Redis for response caching
  • Security: HTTPS, input sanitization, CORS policies

Deployment Options

  • Docker: Containerized deployment
  • Cloud Platforms: AWS, GCP, Azure, Railway, Render
  • Serverless: AWS Lambda, Vercel Functions (with Ollama API)
  • VPS: DigitalOcean, Linode with Docker

Success Criteria

  • API responds to health checks
  • Successfully summarizes text via Ollama
  • Handles errors gracefully
  • Works with Android app
  • Can be containerized
  • All tests pass with >90% coverage
  • Documentation is complete

Future Enhancements (Post-v1)

  • Streaming responses
  • Batch summarization
  • Multiple model support
  • Prompt templates and presets
  • Usage analytics
  • Multi-language support
  • Advanced rate limiting
  • User authentication and authorization