Spaces:

colin730
/

SummarizerApp

Running

App Files Files Community

SummarizerApp / Updated V4 PDP.md

ming

chore: Add test scripts and update local configuration

01d5d83 19 days ago

preview code

raw

history blame contribute delete

7.44 kB

Product Development Plan: Backend V4 (Structured + Streaming)

Objective

Create a new API version (V4) that builds upon the V3 scraping logic.
Crucial Change: Instead of using outlines (which blocks streaming for JSON), we will use Standard Hugging Face Streaming with a strict System Prompt. This ensures the Android app receives the result token-by-token in real-time via Server-Sent Events (SSE).

Constraints & Environment

Platform: Hugging Face Spaces (Docker)
Hardware: CPU Only (Free Tier: 2 vCPU, 16GB RAM)
Memory Management:
- Warning: Phi-3 Mini can spike memory. We will use torch_dtype=torch.float32 on CPU to ensure stability, even if it uses ~8-10GB RAM.

Step 1: Update Dependencies

File: requirements.txt
Action: Ensure these libraries are present.

einops (Required for Phi-3)
accelerate
transformers>=4.41.0
scipy (Often needed for unquantized models)
pytest-asyncio

Step 2: Define Output Schemas

File: app/schemas/summary_v4.py (New File)
Action: Define the structure we expect from the model (used for documentation and validation).
from pydantic import BaseModel, Field
from typing import List
from enum import Enum

class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"

class StructuredSummary(BaseModel):
title: str = Field(..., description="A click-worthy, engaging title")
main_summary: str = Field(..., description="The main summary content")
key_points: List[str] = Field(..., description="List of key facts")
category: str = Field(..., description="Topic category")
sentiment: Sentiment = Field(..., description="Overall sentiment")
read_time_min: int = Field(..., description="Estimated reading time")

Step 3: Implement V4 Model Loader (Standard Transformers)

File: app/services/model_loader_v4.py (New File)
Action: Create a service to load the model and tokenizer directly.
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
import torch
import threading

class ModelServiceV4:
_model = None
_tokenizer = None

@classmethod  
def get\_model(cls):  
    if cls.\_model is None:  
        print("Loading V4 Model (Phi-3)...")  
        model\_id \= "microsoft/Phi-3-mini-4k-instruct"  
        cls.\_tokenizer \= AutoTokenizer.from\_pretrained(model\_id)  
        cls.\_model \= AutoModelForCausalLM.from\_pretrained(  
            model\_id,  
            torch\_dtype=torch.float32, \# CPU friendly  
            device\_map="cpu",  
            trust\_remote\_code=True  
        )  
    return cls.\_model, cls.\_tokenizer

@classmethod  
def stream\_generation(cls, prompt: str):  
    model, tokenizer \= cls.get\_model()  
      
    inputs \= tokenizer(prompt, return\_tensors="pt", return\_attention\_mask=False)  
    streamer \= TextIteratorStreamer(tokenizer, skip\_prompt=True, skip\_special\_tokens=True)  
      
    generation\_kwargs \= dict(  
        inputs,  
        streamer=streamer,  
        max\_new\_tokens=1024,  
        do\_sample=True,  
        temperature=0.2, \# Low temp for stable JSON  
    )

    \# Run generation in a separate thread to unblock the stream  
    thread \= threading.Thread(target=model.generate, kwargs=generation\_kwargs)  
    thread.start()

    for new\_text in streamer:  
        yield new\_text

Step 4: Create V4 Router (SSE Endpoint)

File: app/api/v4/endpoints.py (New Path)
Action: Implement the router using StreamingResponse with text/event-stream.
from fastapi import APIRouter, HTTPException
from fastapi.responses import StreamingResponse
from app.services.model_loader_v4 import ModelServiceV4
# CORRECTED IMPORT PATH:
from app.services.article_scraper import article_scraper_service

router = APIRouter()

JSON_SYSTEM_PROMPT = """You are a helpful AI assistant.
You MUST reply with valid JSON only. Do not add markdown blocks.
The JSON format must exactly match this structure:
{
"title": "string",
"main_summary": "string",
"key_points": ["string", "string"],
"category": "string",
"sentiment": "positive" | "negative" | "neutral",
"read_time_min": int
}
"""

PROMPTS = {
"skimmer": "Summarize concisely. Focus on hard facts.",
"executive": "Summarize for a CEO. Focus on business impact.",
"eli5": "Explain like I'm 5 years old."
}

@router.post("/scrape-and-summarize/stream")
async def scrape_and_summarize_stream(url: str, style: str = "executive"):
# 1. Scrape
try:
# Verify this method name matches your actual service
scrape_result = await article_scraper_service.scrape_url(url)
text = scrape_result.get("content", "")[:10000] # Truncate for memory safety
except Exception as e:
raise HTTPException(status_code=400, detail=f"Scraping failed: {str(e)}")

\# 2\. Construct Prompt  
user\_instruction \= PROMPTS.get(style, PROMPTS\["executive"\])  
  
\# Phi-3 Chat Template  
full\_prompt \= f"\<|system|\>\\n{JSON\_SYSTEM\_PROMPT}\\n\<|end|\>\\n\<|user|\>\\n{user\_instruction}\\n\\nArticle:\\n{text}\\n\<|end|\>\\n\<|assistant|\>"

\# 3\. Stream  
async def event\_generator():  
    \# We assume the synchronous generator can be iterated in this async wrapper  
    for chunk in ModelServiceV4.stream\_generation(full\_prompt):  
        \# SSE Format: data: {content}\\n\\n  
        yield chunk

return StreamingResponse(event\_generator(), media\_type="text/event-stream")

Step 5: Register Router

File: app/main.py
Action: Update the main app file to include the new router path.
# ... existing imports
from app.api.v4 import endpoints as v4_endpoints

# ... inside create_app()
app.include_router(v4_endpoints.router, prefix="/api/v4", tags=["V4 Structured Summarizer"])

Step 6: Update Environment Config

File: env.hf
Action:

ENABLE_V4_STRUCTURED=true

Step 7: Unit Testing (Success Verification)

File: tests/test_v4_stream.py (New File)
Action: Verify the SSE stream works without loading the heavy model.
from unittest.mock import patch, MagicMock
from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

@patch("app.api.v4.endpoints.article_scraper_service")
@patch("app.services.model_loader_v4.ModelServiceV4.stream_generation")
def test_v4_sse_stream(mock_stream, mock_scraper):
# 1. Mock Scraper
mock_scraper.scrape_url.return_value = {"content": "Mock article content"}

\# 2\. Mock Streamer (Yields JSON chunks)  
def fake\_stream(prompt):  
    yield '{"title":'  
    yield ' "Test Title"}'  
mock\_stream.side\_effect \= fake\_stream

\# 3\. Request  
response \= client.post("/api/v4/scrape-and-summarize/stream?url=\[http://test.com\](http://test.com)")

\# 4\. Verify SSE  
assert response.status\_code \== 200  
assert response.headers\["content-type"\] \== "text/event-stream"  
assert b'{"title":' in response.content

Task: Run pytest tests/test_v4_stream.py and ensure it passes.