Spaces:
Running
Product Development Plan: Backend V4 (Structured + Streaming)
Objective
Create a new API version (V4) that builds upon the V3 scraping logic.
Crucial Change: Instead of using outlines (which blocks streaming for JSON), we will use Standard Hugging Face Streaming with a strict System Prompt. This ensures the Android app receives the result token-by-token in real-time via Server-Sent Events (SSE).
Constraints & Environment
- Platform: Hugging Face Spaces (Docker)
- Hardware: CPU Only (Free Tier: 2 vCPU, 16GB RAM)
- Memory Management:
- Warning: Phi-3 Mini can spike memory. We will use torch_dtype=torch.float32 on CPU to ensure stability, even if it uses ~8-10GB RAM.
Step 1: Update Dependencies
File: requirements.txt
Action: Ensure these libraries are present.
- einops (Required for Phi-3)
- accelerate
- transformers>=4.41.0
- scipy (Often needed for unquantized models)
- pytest-asyncio
Step 2: Define Output Schemas
File: app/schemas/summary_v4.py (New File)
Action: Define the structure we expect from the model (used for documentation and validation).
from pydantic import BaseModel, Field
from typing import List
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class StructuredSummary(BaseModel):
title: str = Field(..., description="A click-worthy, engaging title")
main_summary: str = Field(..., description="The main summary content")
key_points: List[str] = Field(..., description="List of key facts")
category: str = Field(..., description="Topic category")
sentiment: Sentiment = Field(..., description="Overall sentiment")
read_time_min: int = Field(..., description="Estimated reading time")
Step 3: Implement V4 Model Loader (Standard Transformers)
File: app/services/model_loader_v4.py (New File)
Action: Create a service to load the model and tokenizer directly.
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
import torch
import threading
class ModelServiceV4:
_model = None
_tokenizer = None
@classmethod
def get\_model(cls):
if cls.\_model is None:
print("Loading V4 Model (Phi-3)...")
model\_id \= "microsoft/Phi-3-mini-4k-instruct"
cls.\_tokenizer \= AutoTokenizer.from\_pretrained(model\_id)
cls.\_model \= AutoModelForCausalLM.from\_pretrained(
model\_id,
torch\_dtype=torch.float32, \# CPU friendly
device\_map="cpu",
trust\_remote\_code=True
)
return cls.\_model, cls.\_tokenizer
@classmethod
def stream\_generation(cls, prompt: str):
model, tokenizer \= cls.get\_model()
inputs \= tokenizer(prompt, return\_tensors="pt", return\_attention\_mask=False)
streamer \= TextIteratorStreamer(tokenizer, skip\_prompt=True, skip\_special\_tokens=True)
generation\_kwargs \= dict(
inputs,
streamer=streamer,
max\_new\_tokens=1024,
do\_sample=True,
temperature=0.2, \# Low temp for stable JSON
)
\# Run generation in a separate thread to unblock the stream
thread \= threading.Thread(target=model.generate, kwargs=generation\_kwargs)
thread.start()
for new\_text in streamer:
yield new\_text
Step 4: Create V4 Router (SSE Endpoint)
File: app/api/v4/endpoints.py (New Path)
Action: Implement the router using StreamingResponse with text/event-stream.
from fastapi import APIRouter, HTTPException
from fastapi.responses import StreamingResponse
from app.services.model_loader_v4 import ModelServiceV4
# CORRECTED IMPORT PATH:
from app.services.article_scraper import article_scraper_service
router = APIRouter()
JSON_SYSTEM_PROMPT = """You are a helpful AI assistant.
You MUST reply with valid JSON only. Do not add markdown blocks.
The JSON format must exactly match this structure:
{
"title": "string",
"main_summary": "string",
"key_points": ["string", "string"],
"category": "string",
"sentiment": "positive" | "negative" | "neutral",
"read_time_min": int
}
"""
PROMPTS = {
"skimmer": "Summarize concisely. Focus on hard facts.",
"executive": "Summarize for a CEO. Focus on business impact.",
"eli5": "Explain like I'm 5 years old."
}
@router.post("/scrape-and-summarize/stream")
async def scrape_and_summarize_stream(url: str, style: str = "executive"):
# 1. Scrape
try:
# Verify this method name matches your actual service
scrape_result = await article_scraper_service.scrape_url(url)
text = scrape_result.get("content", "")[:10000] # Truncate for memory safety
except Exception as e:
raise HTTPException(status_code=400, detail=f"Scraping failed: {str(e)}")
\# 2\. Construct Prompt
user\_instruction \= PROMPTS.get(style, PROMPTS\["executive"\])
\# Phi-3 Chat Template
full\_prompt \= f"\<|system|\>\\n{JSON\_SYSTEM\_PROMPT}\\n\<|end|\>\\n\<|user|\>\\n{user\_instruction}\\n\\nArticle:\\n{text}\\n\<|end|\>\\n\<|assistant|\>"
\# 3\. Stream
async def event\_generator():
\# We assume the synchronous generator can be iterated in this async wrapper
for chunk in ModelServiceV4.stream\_generation(full\_prompt):
\# SSE Format: data: {content}\\n\\n
yield chunk
return StreamingResponse(event\_generator(), media\_type="text/event-stream")
Step 5: Register Router
File: app/main.py
Action: Update the main app file to include the new router path.
# ... existing imports
from app.api.v4 import endpoints as v4_endpoints
# ... inside create_app()
app.include_router(v4_endpoints.router, prefix="/api/v4", tags=["V4 Structured Summarizer"])
Step 6: Update Environment Config
File: env.hf
Action:
- ENABLE_V4_STRUCTURED=true
Step 7: Unit Testing (Success Verification)
File: tests/test_v4_stream.py (New File)
Action: Verify the SSE stream works without loading the heavy model.
from unittest.mock import patch, MagicMock
from fastapi.testclient import TestClient
from app.main import app
client = TestClient(app)
@patch("app.api.v4.endpoints.article_scraper_service")
@patch("app.services.model_loader_v4.ModelServiceV4.stream_generation")
def test_v4_sse_stream(mock_stream, mock_scraper):
# 1. Mock Scraper
mock_scraper.scrape_url.return_value = {"content": "Mock article content"}
\# 2\. Mock Streamer (Yields JSON chunks)
def fake\_stream(prompt):
yield '{"title":'
yield ' "Test Title"}'
mock\_stream.side\_effect \= fake\_stream
\# 3\. Request
response \= client.post("/api/v4/scrape-and-summarize/stream?url=\[http://test.com\](http://test.com)")
\# 4\. Verify SSE
assert response.status\_code \== 200
assert response.headers\["content-type"\] \== "text/event-stream"
assert b'{"title":' in response.content
Task: Run pytest tests/test_v4_stream.py and ensure it passes.