Spaces:

colin730
/

SummarizerApp

Sleeping

ming commited on 22 days ago

Commit

01d5d83

1 Parent(s): 7019b66

chore: Add test scripts and update local configuration

- Add HF V4 NDJSON endpoint test script
- Add HF old endpoint test for comparison
- Add V3 live test script
- Update documentation
- Update local Claude configuration

Files changed (6) hide show

.claude/commands/commit-code.md +9 -0
.claude/settings.local.json +24 -1
Updated V4 PDP.md +198 -0
test_hf_v4_ndjson.py +214 -0
test_hf_v4_old.py +114 -0
test_v3_live.py +78 -0

.claude/commands/commit-code.md ADDED Viewed

	@@ -0,0 +1,9 @@

+# Commit code
+Review the files that have changed, and create a commit with a commit message summarizing the changes made.
+Always try to give short and concise messages that convey the business logic.
+Always push the code to GitHub and also Hugging Face.
+Use user hints to be the message main subject $arguments

.claude/settings.local.json CHANGED Viewed

@@ -1,7 +1,30 @@
 {
   "permissions": {
     "allow": [
-      "WebSearch"
     ],
     "deny": [],
     "ask": []

 {
   "permissions": {
     "allow": [
+      "WebSearch",
+      "Bash(git add:*)",
+      "Bash(git commit:*)",
+      "Bash(git log:*)",
+      "Bash(pytest:*)",
+      "Bash(git push:*)",
+      "Bash(python3:*)",
+      "Bash(tree:*)",
+      "Bash(python -m pytest:*)",
+      "Bash(lsof:*)",
+      "Bash(python test_v3_live.py:*)",
+      "Bash(pkill:*)",
+      "Bash(pip install:*)",
+      "Bash(pip --version:*)",
+      "Bash(python:*)",
+      "Bash(conda install:*)",
+      "Bash(conda env:*)",
+      "Bash(conda run:*)",
+      "Bash(cat:*)",
+      "Bash(curl:*)",
+      "Bash(timeout 15 conda run --no-capture-output -n summarizer python -m uvicorn:*)",
+      "Bash(/opt/anaconda3/envs/summarizer/bin/python:*)",
+      "Bash(ENABLE_V4_WARMUP=true timeout 15 /opt/anaconda3/envs/summarizer/bin/python:*)",
+      "Bash(ENABLE_V4_WARMUP=true /opt/anaconda3/envs/summarizer/bin/python:*)"
     ],
     "deny": [],
     "ask": []

Updated V4 PDP.md ADDED Viewed

	@@ -0,0 +1,198 @@

+# **Product Development Plan: Backend V4 (Structured \+ Streaming)**
+## **Objective**
+Create a new API version (V4) that builds upon the V3 scraping logic.
+Crucial Change: Instead of using outlines (which blocks streaming for JSON), we will use Standard Hugging Face Streaming with a strict System Prompt. This ensures the Android app receives the result token-by-token in real-time via Server-Sent Events (SSE).
+## **Constraints & Environment**
+* **Platform:** Hugging Face Spaces (Docker)
+* **Hardware:** CPU Only (Free Tier: 2 vCPU, 16GB RAM)
+* **Memory Management:**
+  * **Warning:** Phi-3 Mini can spike memory. We will use torch\_dtype=torch.float32 on CPU to ensure stability, even if it uses \~8-10GB RAM.
+## **Step 1: Update Dependencies**
+File: requirements.txt
+Action: Ensure these libraries are present.
+* einops (Required for Phi-3)
+* accelerate
+* transformers\>=4.41.0
+* scipy (Often needed for unquantized models)
+* pytest-asyncio
+## **Step 2: Define Output Schemas**
+File: app/schemas/summary\_v4.py (New File)
+Action: Define the structure we expect from the model (used for documentation and validation).
+from pydantic import BaseModel, Field
+from typing import List
+from enum import Enum
+class Sentiment(str, Enum):
+    POSITIVE \= "positive"
+    NEGATIVE \= "negative"
+    NEUTRAL \= "neutral"
+class StructuredSummary(BaseModel):
+    title: str \= Field(..., description="A click-worthy, engaging title")
+    main\_summary: str \= Field(..., description="The main summary content")
+    key\_points: List\[str\] \= Field(..., description="List of key facts")
+    category: str \= Field(..., description="Topic category")
+    sentiment: Sentiment \= Field(..., description="Overall sentiment")
+    read\_time\_min: int \= Field(..., description="Estimated reading time")
+## **Step 3: Implement V4 Model Loader (Standard Transformers)**
+File: app/services/model\_loader\_v4.py (New File)
+Action: Create a service to load the model and tokenizer directly.
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
+import torch
+import threading
+class ModelServiceV4:
+    \_model \= None
+    \_tokenizer \= None
+    @classmethod
+    def get\_model(cls):
+        if cls.\_model is None:
+            print("Loading V4 Model (Phi-3)...")
+            model\_id \= "microsoft/Phi-3-mini-4k-instruct"
+            cls.\_tokenizer \= AutoTokenizer.from\_pretrained(model\_id)
+            cls.\_model \= AutoModelForCausalLM.from\_pretrained(
+                model\_id,
+                torch\_dtype=torch.float32, \# CPU friendly
+                device\_map="cpu",
+                trust\_remote\_code=True
+            )
+        return cls.\_model, cls.\_tokenizer
+    @classmethod
+    def stream\_generation(cls, prompt: str):
+        model, tokenizer \= cls.get\_model()
+        inputs \= tokenizer(prompt, return\_tensors="pt", return\_attention\_mask=False)
+        streamer \= TextIteratorStreamer(tokenizer, skip\_prompt=True, skip\_special\_tokens=True)
+        generation\_kwargs \= dict(
+            inputs,
+            streamer=streamer,
+            max\_new\_tokens=1024,
+            do\_sample=True,
+            temperature=0.2, \# Low temp for stable JSON
+        )
+        \# Run generation in a separate thread to unblock the stream
+        thread \= threading.Thread(target=model.generate, kwargs=generation\_kwargs)
+        thread.start()
+        for new\_text in streamer:
+            yield new\_text
+## **Step 4: Create V4 Router (SSE Endpoint)**
+File: app/api/v4/endpoints.py (New Path)
+Action: Implement the router using StreamingResponse with text/event-stream.
+from fastapi import APIRouter, HTTPException
+from fastapi.responses import StreamingResponse
+from app.services.model\_loader\_v4 import ModelServiceV4
+\# CORRECTED IMPORT PATH:
+from app.services.article\_scraper import article\_scraper\_service
+router \= APIRouter()
+JSON\_SYSTEM\_PROMPT \= """You are a helpful AI assistant.
+You MUST reply with valid JSON only. Do not add markdown blocks.
+The JSON format must exactly match this structure:
+{
+    "title": "string",
+    "main\_summary": "string",
+    "key\_points": \["string", "string"\],
+    "category": "string",
+    "sentiment": "positive" | "negative" | "neutral",
+    "read\_time\_min": int
+}
+"""
+PROMPTS \= {
+    "skimmer": "Summarize concisely. Focus on hard facts.",
+    "executive": "Summarize for a CEO. Focus on business impact.",
+    "eli5": "Explain like I'm 5 years old."
+}
+@router.post("/scrape-and-summarize/stream")
+async def scrape\_and\_summarize\_stream(url: str, style: str \= "executive"):
+    \# 1\. Scrape
+    try:
+        \# Verify this method name matches your actual service
+        scrape\_result \= await article\_scraper\_service.scrape\_url(url)
+        text \= scrape\_result.get("content", "")\[:10000\] \# Truncate for memory safety
+    except Exception as e:
+        raise HTTPException(status\_code=400, detail=f"Scraping failed: {str(e)}")
+    \# 2\. Construct Prompt
+    user\_instruction \= PROMPTS.get(style, PROMPTS\["executive"\])
+    \# Phi-3 Chat Template
+    full\_prompt \= f"\<|system|\>\\n{JSON\_SYSTEM\_PROMPT}\\n\<|end|\>\\n\<|user|\>\\n{user\_instruction}\\n\\nArticle:\\n{text}\\n\<|end|\>\\n\<|assistant|\>"
+    \# 3\. Stream
+    async def event\_generator():
+        \# We assume the synchronous generator can be iterated in this async wrapper
+        for chunk in ModelServiceV4.stream\_generation(full\_prompt):
+            \# SSE Format: data: {content}\\n\\n
+            yield chunk
+    return StreamingResponse(event\_generator(), media\_type="text/event-stream")
+## **Step 5: Register Router**
+File: app/main.py
+Action: Update the main app file to include the new router path.
+\# ... existing imports
+from app.api.v4 import endpoints as v4\_endpoints
+\# ... inside create\_app()
+app.include\_router(v4\_endpoints.router, prefix="/api/v4", tags=\["V4 Structured Summarizer"\])
+## **Step 6: Update Environment Config**
+File: env.hf
+Action:
+* ENABLE\_V4\_STRUCTURED=true
+## **Step 7: Unit Testing (Success Verification)**
+File: tests/test\_v4\_stream.py (New File)
+Action: Verify the SSE stream works without loading the heavy model.
+from unittest.mock import patch, MagicMock
+from fastapi.testclient import TestClient
+from app.main import app
+client \= TestClient(app)
+@patch("app.api.v4.endpoints.article\_scraper\_service")
+@patch("app.services.model\_loader\_v4.ModelServiceV4.stream\_generation")
+def test\_v4\_sse\_stream(mock\_stream, mock\_scraper):
+    \# 1\. Mock Scraper
+    mock\_scraper.scrape\_url.return\_value \= {"content": "Mock article content"}
+    \# 2\. Mock Streamer (Yields JSON chunks)
+    def fake\_stream(prompt):
+        yield '{"title":'
+        yield ' "Test Title"}'
+    mock\_stream.side\_effect \= fake\_stream
+    \# 3\. Request
+    response \= client.post("/api/v4/scrape-and-summarize/stream?url=\[http://test.com\](http://test.com)")
+    \# 4\. Verify SSE
+    assert response.status\_code \== 200
+    assert response.headers\["content-type"\] \== "text/event-stream"
+    assert b'{"title":' in response.content
+**Task:** Run pytest tests/test\_v4\_stream.py and ensure it passes.

test_hf_v4_ndjson.py ADDED Viewed

	@@ -0,0 +1,214 @@

+"""
+Test the Hugging Face V4 NDJSON endpoint with a real URL.
+"""
+import asyncio
+import json
+import httpx
+async def test_hf_ndjson_endpoint():
+    """Test HF V4 NDJSON endpoint with URL scraping."""
+    # Hugging Face Space URL
+    hf_space_url = "https://colin730-summarizerapp.hf.space"
+    url = "https://www.nzherald.co.nz/nz/auckland/mt-wellington-homicide-jury-find-couple-not-guilty-of-murder-after-soldier-stormed-their-house-with-knife/B56S6KBHRVFCZMLDI56AZES6KY/"
+    print("=" * 80)
+    print("Hugging Face V4 NDJSON Endpoint Test")
+    print("=" * 80)
+    print(f"\nHF Space: {hf_space_url}")
+    print(f"Endpoint: {hf_space_url}/api/v4/scrape-and-summarize/stream-ndjson")
+    print(f"Article URL: {url[:80]}...")
+    print(f"Style: executive\n")
+    payload = {
+        "url": url,
+        "style": "executive",
+        "max_tokens": 512,
+        "include_metadata": True,
+        "use_cache": True,
+    }
+    # Longer timeout for HF (first request can be slow if cold start)
+    async with httpx.AsyncClient(timeout=600.0) as client:
+        try:
+            print("🔄 Sending request to Hugging Face...")
+            print("⏱️  Note: First request may take 30-60s if instance is cold\n")
+            # Make streaming request
+            async with client.stream(
+                "POST",
+                f"{hf_space_url}/api/v4/scrape-and-summarize/stream-ndjson",
+                json=payload,
+            ) as response:
+                print(f"Status: {response.status_code}")
+                if response.status_code != 200:
+                    error_text = await response.aread()
+                    error_str = error_text.decode()
+                    print(f"\n❌ Error Response:")
+                    print(error_str)
+                    # Check if it's a 404 (endpoint not found)
+                    if response.status_code == 404:
+                        print("\n💡 The endpoint might not be deployed yet.")
+                        print("   The HF Space may still be building (~5-10 minutes).")
+                        print(f"   Check status at: https://huggingface.co/spaces/colin730/SummarizerApp")
+                    return
+                print("\n" + "=" * 80)
+                print("STREAMING EVENTS")
+                print("=" * 80)
+                event_count = 0
+                final_state = None
+                total_tokens = 0
+                metadata = None
+                # Parse SSE stream
+                async for line in response.aiter_lines():
+                    if line.startswith("data: "):
+                        try:
+                            event = json.loads(line[6:])
+                            # Handle metadata event
+                            if event.get("type") == "metadata":
+                                metadata = event["data"]
+                                print("\n--- Metadata Event ---")
+                                print(json.dumps(metadata, indent=2))
+                                print("\n" + "-" * 80)
+                                continue
+                            event_count += 1
+                            # Check for error
+                            if "error" in event:
+                                print(f"\n❌ ERROR: {event['error']}")
+                                if "model not available" in event['error'].lower():
+                                    print("\n💡 This means:")
+                                    print("   - The endpoint is working ✅")
+                                    print("   - Scraping is working ✅")
+                                    print("   - But the model isn't loaded on HF")
+                                    print("   - This is expected if PyTorch/transformers aren't installed")
+                                return
+                            # Extract event data
+                            delta = event.get("delta")
+                            state = event.get("state")
+                            done = event.get("done", False)
+                            tokens_used = event.get("tokens_used", 0)
+                            latency_ms = event.get("latency_ms")
+                            total_tokens = tokens_used
+                            # Print event details (compact format)
+                            if delta and "op" in delta:
+                                op = delta.get("op")
+                                if op == "set":
+                                    field = delta.get("field")
+                                    value = delta.get("value")
+                                    value_str = str(value)[:60] + "..." if len(str(value)) > 60 else str(value)
+                                    print(f"Event #{event_count}: Set {field} = {value_str}")
+                                elif op == "append":
+                                    field = delta.get("field")
+                                    value = delta.get("value")
+                                    value_str = str(value)[:60] + "..." if len(str(value)) > 60 else str(value)
+                                    print(f"Event #{event_count}: Append to {field}: {value_str}")
+                                elif op == "done":
+                                    print(f"Event #{event_count}: ✅ Done signal received")
+                            elif delta is None and done:
+                                print(f"Event #{event_count}: 🏁 Final event (latency: {latency_ms}ms)")
+                            # Store final state
+                            if state:
+                                final_state = state
+                        except json.JSONDecodeError as e:
+                            print(f"Failed to parse JSON: {e}")
+                            print(f"Raw line: {line}")
+                # Print final results
+                print("\n" + "=" * 80)
+                print("FINAL RESULTS")
+                print("=" * 80)
+                if metadata:
+                    print(f"\n--- Scraping Info ---")
+                    print(f"Input type: {metadata.get('input_type')}")
+                    print(f"Article title: {metadata.get('title')}")
+                    print(f"Site: {metadata.get('site_name')}")
+                    print(f"Scrape method: {metadata.get('scrape_method')}")
+                    print(f"Scrape latency: {metadata.get('scrape_latency_ms', 0):.2f}ms")
+                    print(f"Text extracted: {metadata.get('extracted_text_length', 0)} chars")
+                print(f"\nTotal events: {event_count}")
+                print(f"Total tokens: {total_tokens}")
+                if final_state:
+                    print("\n--- Final Structured State ---")
+                    print(json.dumps(final_state, indent=2, ensure_ascii=False))
+                    # Validate structure
+                    print("\n--- Validation ---")
+                    required_fields = ["title", "main_summary", "key_points", "category", "sentiment", "read_time_min"]
+                    all_valid = True
+                    for field in required_fields:
+                        value = final_state.get(field)
+                        if field == "key_points":
+                            if isinstance(value, list) and len(value) > 0:
+                                print(f"✅ {field}: {len(value)} items")
+                            else:
+                                print(f"⚠️  {field}: empty or not a list")
+                                all_valid = False
+                        else:
+                            if value is not None:
+                                value_str = str(value)[:50] + "..." if len(str(value)) > 50 else str(value)
+                                print(f"✅ {field}: {value_str}")
+                            else:
+                                print(f"⚠️  {field}: None")
+                                all_valid = False
+                    # Check sentiment is valid
+                    sentiment = final_state.get("sentiment")
+                    valid_sentiments = ["positive", "negative", "neutral"]
+                    if sentiment in valid_sentiments:
+                        print(f"✅ sentiment value is valid: {sentiment}")
+                    else:
+                        print(f"⚠️  sentiment value is invalid: {sentiment}")
+                        all_valid = False
+                    print("\n" + "=" * 80)
+                    if all_valid:
+                        print("✅ ALL VALIDATIONS PASSED - HF ENDPOINT WORKING!")
+                    else:
+                        print("⚠️  Some validations failed")
+                    print("=" * 80)
+                else:
+                    print("\n⚠️  No final state received")
+        except httpx.ConnectError:
+            print(f"\n❌ Could not connect to {hf_space_url}")
+            print("\n💡 Possible reasons:")
+            print("   1. HF Space is still building/deploying")
+            print("   2. HF Space is sleeping (free tier)")
+            print("   3. Network connectivity issue")
+            print(f"\n🔗 Check space status: https://huggingface.co/spaces/colin730/SummarizerApp")
+        except httpx.ReadTimeout:
+            print("\n⏱️  Request timed out")
+            print("   This might mean the HF Space is cold-starting")
+            print("   Try again in a few moments")
+        except Exception as e:
+            print(f"\n❌ Error: {e}")
+            import traceback
+            traceback.print_exc()
+if __name__ == "__main__":
+    print("\n🚀 Testing Hugging Face V4 NDJSON Endpoint\n")
+    asyncio.run(test_hf_ndjson_endpoint())

test_hf_v4_old.py ADDED Viewed

	@@ -0,0 +1,114 @@

+"""
+Test the old HF V4 endpoint to see what the model generates.
+"""
+import asyncio
+import json
+import httpx
+async def test_hf_old_endpoint():
+    """Test HF V4 old (non-NDJSON) endpoint."""
+    hf_space_url = "https://colin730-summarizerapp.hf.space"
+    url = "https://www.nzherald.co.nz/nz/auckland/mt-wellington-homicide-jury-find-couple-not-guilty-of-murder-after-soldier-stormed-their-house-with-knife/B56S6KBHRVFCZMLDI56AZES6KY/"
+    print("=" * 80)
+    print("Hugging Face V4 OLD Endpoint Test (for comparison)")
+    print("=" * 80)
+    print(f"\nEndpoint: {hf_space_url}/api/v4/scrape-and-summarize/stream")
+    print(f"Article URL: {url[:80]}...")
+    print(f"Style: executive\n")
+    payload = {
+        "url": url,
+        "style": "executive",
+        "max_tokens": 512,
+        "include_metadata": True,
+        "use_cache": True,
+    }
+    async with httpx.AsyncClient(timeout=600.0) as client:
+        try:
+            print("🔄 Sending request to old V4 endpoint...\n")
+            async with client.stream(
+                "POST",
+                f"{hf_space_url}/api/v4/scrape-and-summarize/stream",
+                json=payload,
+            ) as response:
+                print(f"Status: {response.status_code}\n")
+                if response.status_code != 200:
+                    error_text = await response.aread()
+                    print(f"❌ Error: {error_text.decode()}")
+                    return
+                print("=" * 80)
+                print("MODEL OUTPUT (Raw)")
+                print("=" * 80)
+                print()
+                full_content = []
+                token_count = 0
+                async for line in response.aiter_lines():
+                    if line.startswith("data: "):
+                        try:
+                            event = json.loads(line[6:])
+                            # Metadata
+                            if event.get("type") == "metadata":
+                                print("--- Metadata ---")
+                                print(json.dumps(event["data"], indent=2))
+                                print("\n" + "-" * 80 + "\n")
+                                continue
+                            # Error
+                            if "error" in event:
+                                print(f"\n❌ ERROR: {event['error']}")
+                                return
+                            # Content
+                            if "content" in event and not event.get("done"):
+                                content = event["content"]
+                                full_content.append(content)
+                                print(content, end="", flush=True)
+                                token_count = event.get("tokens_used", token_count)
+                            # Done
+                            elif event.get("done"):
+                                latency = event.get("latency_ms", 0)
+                                token_count = event.get("tokens_used", token_count)
+                                print(f"\n\n{'=' * 80}")
+                                print(f"✅ Done | Tokens: {token_count} | Latency: {latency:.2f}ms")
+                                print("=" * 80)
+                        except json.JSONDecodeError as e:
+                            print(f"\nJSON Error: {e}")
+                            print(f"Raw: {line}")
+                # Try to parse as JSON
+                full_text = "".join(full_content)
+                if full_text:
+                    print("\n--- Attempting JSON Parse ---")
+                    try:
+                        parsed = json.loads(full_text)
+                        print("✅ Valid JSON!")
+                        print(json.dumps(parsed, indent=2))
+                    except json.JSONDecodeError:
+                        print("❌ Not valid JSON")
+                        print("This is the raw model output (not JSON-formatted)")
+        except Exception as e:
+            print(f"\n❌ Error: {e}")
+            import traceback
+            traceback.print_exc()
+if __name__ == "__main__":
+    print("\n🧪 Testing Old V4 Endpoint\n")
+    asyncio.run(test_hf_old_endpoint())

test_v3_live.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""
+Live test of V3 API endpoint with real URL.
+"""
+import asyncio
+import json
+import httpx
+async def test_v3_streaming():
+    """Test V3 scraping and summarization with streaming."""
+    url = "https://www.nzherald.co.nz/nz/prominent-executive-who-admitted-receiving-commercial-sex-services-from-girl-bought-her-uber-eats-200-gift-card-1000-cash/RWWAZCPM4BDHNPKLGGAPUKVQ7M/"
+    async with httpx.AsyncClient(timeout=300.0) as client:
+        # Make streaming request
+        async with client.stream(
+            "POST",
+            "http://localhost:7860/api/v3/scrape-and-summarize/stream",
+            json={
+                "url": url,
+                "max_tokens": 256,
+                "include_metadata": True,
+            },
+        ) as response:
+            print(f"Status: {response.status_code}")
+            print(f"Headers: {dict(response.headers)}\n")
+            if response.status_code != 200:
+                error_text = await response.aread()
+                print(f"Error: {error_text.decode()}")
+                return
+            # Parse SSE stream
+            full_summary = []
+            async for line in response.aiter_lines():
+                if line.startswith("data: "):
+                    try:
+                        event = json.loads(line[6:])
+                        # Print metadata event
+                        if event.get("type") == "metadata":
+                            print("=== ARTICLE METADATA ===")
+                            metadata = event["data"]
+                            print(f"Title: {metadata.get('title', 'N/A')}")
+                            print(f"Author: {metadata.get('author', 'N/A')}")
+                            print(f"Site: {metadata.get('site_name', 'N/A')}")
+                            print(f"Scrape latency: {metadata.get('scrape_latency_ms', 0):.2f}ms")
+                            print(f"Extracted text length: {metadata.get('extracted_text_length', 0)} chars")
+                            print()
+                        # Collect content chunks
+                        elif "content" in event:
+                            if not event.get("done", False):
+                                content = event["content"]
+                                full_summary.append(content)
+                                print(content, end="", flush=True)
+                            else:
+                                # Done event
+                                print(f"\n\n=== SUMMARY STATS ===")
+                                print(f"Tokens used: {event.get('tokens_used', 0)}")
+                                print(f"Latency: {event.get('latency_ms', 0):.2f}ms")
+                        # Error event
+                        elif "error" in event:
+                            print(f"\n\nERROR: {event['error']}")
+                    except json.JSONDecodeError as e:
+                        print(f"Failed to parse JSON: {e}")
+                        print(f"Raw line: {line}")
+            print("\n\n=== FULL SUMMARY ===")
+            print("".join(full_summary))
+if __name__ == "__main__":
+    print("Testing V3 API with NZ Herald article...\n")
+    asyncio.run(test_v3_streaming())