Spaces:
Running
Running
Cloud Deployment Guide
This guide covers multiple options for deploying your text summarizer backend to the cloud.
π Option 1: Railway (Recommended - Easiest)
Railway is perfect for this project because it supports Docker Compose and persistent volumes.
Steps:
Create Railway Account
# Install Railway CLI npm install -g @railway/cli # Login railway loginDeploy from GitHub
- Go to railway.app
- Connect your GitHub repository
- Select your
SummerizerApprepository - Railway will automatically detect
docker-compose.yml
Set Environment Variables In Railway dashboard, add these environment variables:
OLLAMA_MODEL=llama3.1:8b OLLAMA_HOST=http://ollama:11434 OLLAMA_TIMEOUT=30 SERVER_HOST=0.0.0.0 SERVER_PORT=8000 LOG_LEVEL=INFODeploy
# Or deploy via CLI railway up
Railway Advantages:
- β Supports Docker Compose
- β Persistent volumes for Ollama models
- β Automatic HTTPS
- β Easy environment variable management
- β Built-in monitoring
βοΈ Option 2: Google Cloud Run
Steps:
Build and Push to Google Container Registry
# Set up gcloud CLI gcloud auth login gcloud config set project YOUR_PROJECT_ID # Build and push docker build -t gcr.io/YOUR_PROJECT_ID/summarizer-backend . docker push gcr.io/YOUR_PROJECT_ID/summarizer-backendDeploy with Cloud Run
gcloud run deploy summarizer-backend \ --image gcr.io/YOUR_PROJECT_ID/summarizer-backend \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --memory 4Gi \ --cpu 2 \ --timeout 300 \ --set-env-vars OLLAMA_MODEL=llama3.1:8b,SERVER_HOST=0.0.0.0,SERVER_PORT=8000
Cloud Run Advantages:
- β Serverless scaling
- β Pay per request
- β Global CDN
- β Integrated with Google Cloud
π³ Option 3: AWS ECS with Fargate
Steps:
Create ECR Repository
aws ecr create-repository --repository-name summarizer-backendBuild and Push
# Get login token aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com # Build and push docker build -t summarizer-backend . docker tag summarizer-backend:latest YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/summarizer-backend:latest docker push YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/summarizer-backend:latestCreate ECS Task Definition
{ "family": "summarizer-backend", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "2048", "memory": "4096", "executionRoleArn": "arn:aws:iam::YOUR_ACCOUNT:role/ecsTaskExecutionRole", "containerDefinitions": [ { "name": "summarizer-backend", "image": "YOUR_ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/summarizer-backend:latest", "portMappings": [ { "containerPort": 8000, "protocol": "tcp" } ], "environment": [ { "name": "OLLAMA_MODEL", "value": "llama3.1:8b" }, { "name": "SERVER_HOST", "value": "0.0.0.0" }, { "name": "SERVER_PORT", "value": "8000" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/summarizer-backend", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" } } } ] }
π Option 4: DigitalOcean App Platform
Steps:
Create App Spec
# .do/app.yaml name: summarizer-backend services: - name: api source_dir: / github: repo: MingLu0/SummarizerBackend branch: main run_command: uvicorn app.main:app --host 0.0.0.0 --port 8080 environment_slug: python instance_count: 1 instance_size_slug: basic-xxl http_port: 8080 envs: - key: OLLAMA_MODEL value: llama3.1:8b - key: SERVER_HOST value: 0.0.0.0 - key: SERVER_PORT value: 8080Deploy
doctl apps create --spec .do/app.yaml
π§ Option 5: Render (Simple)
Steps:
Connect GitHub Repository
- Go to render.com
- Connect your GitHub account
- Select your repository
Create Web Service
- Choose "Web Service"
- Select your repository
- Use these settings:
Build Command: docker-compose build Start Command: docker-compose up Environment: Docker
Set Environment Variables
OLLAMA_MODEL=llama3.1:8b OLLAMA_HOST=http://ollama:11434 SERVER_HOST=0.0.0.0 SERVER_PORT=8000
β οΈ Important Considerations
Model Download in Cloud
Your Ollama models need to be downloaded after deployment. Add this to your deployment:
# Add to docker-compose.yml or startup script
ollama pull llama3.1:8b
Memory Requirements
- llama3.1:8b needs ~8GB RAM
- llama3.1:7b needs ~7GB RAM
- mistral:7b needs ~7GB RAM
Cost Optimization
- Use smaller models for production:
mistral:7borllama3.1:7b - Consider using spot instances for development
- Monitor usage and scale accordingly
Security
- Enable API key authentication for production
- Use HTTPS (most platforms provide this automatically)
- Set up rate limiting
- Monitor logs for abuse
π― Recommended Deployment Flow
- Start with Railway (easiest setup)
- Test with a smaller model (mistral:7b)
- Monitor performance and costs
- Scale up model size if needed
- Add security features
Quick Railway Deploy:
# 1. Install Railway CLI
npm install -g @railway/cli
# 2. Login and deploy
railway login
railway init
railway up
Your backend will be live at https://your-app.railway.app! π