Spaces:
Sleeping
Sleeping
| # V3 API Fix: Support Both URL and Text Input | |
| ## Problem Statement | |
| The V3 endpoint `/api/v3/scrape-and-summarize/stream` currently only accepts URLs in the request body. When the Android app sends plain text instead of a URL, the request fails with **422 Unprocessable Entity** due to URL validation failure. | |
| ### Error Symptoms | |
| ``` | |
| INFO: 10.16.17.219:29372 - "POST /api/v3/scrape-and-summarize/stream HTTP/1.1" 422 Unprocessable Entity | |
| 2025-11-11 05:39:49,140 - app.core.middleware - INFO - Request lXqCov: POST /api/v3/scrape-and-summarize/stream | |
| 2025-11-11 05:39:49,143 - app.core.middleware - INFO - Response lXqCov: 422 (2.64ms) | |
| ``` | |
| **Key Indicator:** Response time < 3ms means the request is failing at **schema validation** before any scraping logic runs. | |
| ### Root Cause | |
| **Current Schema** (`app/api/v3/schemas.py`): | |
| ```python | |
| class ScrapeAndSummarizeRequest(BaseModel): | |
| url: str = Field(..., description="URL of article to scrape and summarize") | |
| # ... other fields | |
| @validator('url') | |
| def validate_url(cls, v): | |
| # URL validation regex that rejects plain text | |
| if not url_pattern.match(v): | |
| raise ValueError('Invalid URL format') | |
| return v | |
| ``` | |
| **Problem:** The `url` field is **required** and must match URL pattern. When Android app sends plain text (non-URL), validation fails β 422 error. | |
| --- | |
| ## Solution Overview | |
| Make the V3 endpoint **intelligent** - it should handle both: | |
| 1. **URL Input** β Scrape article from URL + Summarize | |
| 2. **Text Input** β Skip scraping + Summarize directly | |
| This provides a single, unified endpoint for the Android app without needing to choose between multiple endpoints. | |
| --- | |
| ## Design Approach | |
| ### Option 1: Flexible Input Field (Recommended) | |
| **Schema Design:** | |
| ```python | |
| class ScrapeAndSummarizeRequest(BaseModel): | |
| url: Optional[str] = None | |
| text: Optional[str] = None | |
| # ... other fields (max_tokens, temperature, etc.) | |
| @model_validator(mode='after') | |
| def check_url_or_text(self): | |
| """Ensure exactly one of url or text is provided.""" | |
| if not self.url and not self.text: | |
| raise ValueError('Either url or text must be provided') | |
| if self.url and self.text: | |
| raise ValueError('Provide either url OR text, not both') | |
| return self | |
| @field_validator('url') | |
| def validate_url(cls, v): | |
| """Validate URL format if provided.""" | |
| if v is None: | |
| return v | |
| # URL validation logic | |
| return v | |
| @field_validator('text') | |
| def validate_text(cls, v): | |
| """Validate text if provided.""" | |
| if v is None: | |
| return v | |
| if len(v) < 50: | |
| raise ValueError('Text too short (minimum 50 characters)') | |
| if len(v) > 50000: | |
| raise ValueError('Text too long (maximum 50,000 characters)') | |
| return v | |
| ``` | |
| **Request Examples:** | |
| ```json | |
| // URL-based request (scraping enabled) | |
| { | |
| "url": "https://example.com/article", | |
| "max_tokens": 256, | |
| "temperature": 0.3 | |
| } | |
| // Text-based request (direct summarization) | |
| { | |
| "text": "Your article text here...", | |
| "max_tokens": 256, | |
| "temperature": 0.3 | |
| } | |
| ``` | |
| **Endpoint Logic:** | |
| ```python | |
| @router.post("/scrape-and-summarize/stream") | |
| async def scrape_and_summarize_stream( | |
| request: Request, | |
| payload: ScrapeAndSummarizeRequest | |
| ): | |
| """Handle both URL scraping and direct text summarization.""" | |
| # Determine input type | |
| if payload.url: | |
| # URL input β Scrape + Summarize | |
| article_data = await article_scraper_service.scrape_article(payload.url) | |
| text_to_summarize = article_data['text'] | |
| metadata = { | |
| 'title': article_data.get('title'), | |
| 'author': article_data.get('author'), | |
| 'source': 'scraped', | |
| 'scrape_latency_ms': article_data.get('scrape_time_ms') | |
| } | |
| else: | |
| # Text input β Direct Summarization | |
| text_to_summarize = payload.text | |
| metadata = { | |
| 'source': 'direct_text', | |
| 'text_length': len(payload.text) | |
| } | |
| # Stream summarization (same for both paths) | |
| return StreamingResponse( | |
| _stream_generator(text_to_summarize, payload, metadata, request_id), | |
| media_type="text/event-stream", | |
| headers={"Cache-Control": "no-cache", ...} | |
| ) | |
| ``` | |
| --- | |
| ### Option 2: Auto-Detection (Alternative) | |
| **Schema Design:** | |
| ```python | |
| class ScrapeAndSummarizeRequest(BaseModel): | |
| input: str = Field(..., description="URL to scrape OR text to summarize") | |
| # ... other fields | |
| ``` | |
| **Endpoint Logic:** | |
| ```python | |
| # Auto-detect if input is URL or text | |
| if _is_valid_url(payload.input): | |
| # URL detected β Scrape + Summarize | |
| article_data = await article_scraper_service.scrape_article(payload.input) | |
| text_to_summarize = article_data['text'] | |
| else: | |
| # Plain text detected β Direct Summarization | |
| text_to_summarize = payload.input | |
| ``` | |
| **Pros:** | |
| - Single input field (simpler API) | |
| - Auto-detection is smart | |
| **Cons:** | |
| - Ambiguous: What if text looks like a URL? | |
| - Harder to debug issues | |
| - Less explicit intent | |
| **Verdict:** Option 1 is clearer and more explicit. | |
| --- | |
| ## Implementation Plan | |
| ### Step 1: Update Request Schema | |
| **File:** `app/api/v3/schemas.py` | |
| **Changes:** | |
| 1. Make `url` field Optional (change from required to `Optional[str] = None`) | |
| 2. Add `text` field as Optional (`Optional[str] = None`) | |
| 3. Add `@model_validator` to ensure exactly one is provided | |
| 4. Update `url` validator to handle None | |
| 5. Add `text` validator for length constraints | |
| **Code:** | |
| ```python | |
| from pydantic import BaseModel, Field, field_validator, model_validator | |
| from typing import Optional | |
| import re | |
| class ScrapeAndSummarizeRequest(BaseModel): | |
| """Request schema supporting both URL scraping and direct text summarization.""" | |
| url: Optional[str] = Field( | |
| None, | |
| description="URL of article to scrape and summarize", | |
| example="https://example.com/article" | |
| ) | |
| text: Optional[str] = Field( | |
| None, | |
| description="Direct text to summarize (alternative to URL)", | |
| example="Your article text here..." | |
| ) | |
| max_tokens: Optional[int] = Field( | |
| default=256, | |
| ge=1, | |
| le=2048, | |
| description="Maximum tokens in summary" | |
| ) | |
| temperature: Optional[float] = Field( | |
| default=0.3, | |
| ge=0.0, | |
| le=2.0, | |
| description="Sampling temperature" | |
| ) | |
| top_p: Optional[float] = Field( | |
| default=0.9, | |
| ge=0.0, | |
| le=1.0, | |
| description="Nucleus sampling" | |
| ) | |
| prompt: Optional[str] = Field( | |
| default="Summarize this article concisely:", | |
| description="Custom summarization prompt" | |
| ) | |
| include_metadata: Optional[bool] = Field( | |
| default=True, | |
| description="Include article metadata in response" | |
| ) | |
| use_cache: Optional[bool] = Field( | |
| default=True, | |
| description="Use cached content if available (URL mode only)" | |
| ) | |
| @model_validator(mode='after') | |
| def check_url_or_text(self): | |
| """Ensure exactly one of url or text is provided.""" | |
| if not self.url and not self.text: | |
| raise ValueError('Either "url" or "text" must be provided') | |
| if self.url and self.text: | |
| raise ValueError('Provide either "url" OR "text", not both') | |
| return self | |
| @field_validator('url') | |
| @classmethod | |
| def validate_url(cls, v: Optional[str]) -> Optional[str]: | |
| """Validate URL format if provided.""" | |
| if v is None: | |
| return v | |
| # URL validation regex | |
| url_pattern = re.compile( | |
| r'^https?://' # http:// or https:// | |
| r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|' # domain | |
| r'localhost|' # localhost | |
| r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # or IP | |
| r'(?::\d+)?' # optional port | |
| r'(?:/?|[/?]\S+)$', re.IGNORECASE | |
| ) | |
| if not url_pattern.match(v): | |
| raise ValueError('Invalid URL format. Must start with http:// or https://') | |
| # SSRF protection | |
| v_lower = v.lower() | |
| if 'localhost' in v_lower or '127.0.0.1' in v: | |
| raise ValueError('Cannot scrape localhost URLs') | |
| if any(private in v for private in ['192.168.', '10.', '172.16.', '172.17.', '172.18.']): | |
| raise ValueError('Cannot scrape private IP addresses') | |
| if len(v) > 2000: | |
| raise ValueError('URL too long (maximum 2000 characters)') | |
| return v | |
| @field_validator('text') | |
| @classmethod | |
| def validate_text(cls, v: Optional[str]) -> Optional[str]: | |
| """Validate text content if provided.""" | |
| if v is None: | |
| return v | |
| if len(v) < 50: | |
| raise ValueError('Text too short (minimum 50 characters)') | |
| if len(v) > 50000: | |
| raise ValueError('Text too long (maximum 50,000 characters)') | |
| # Check for mostly whitespace | |
| non_whitespace = len(v.replace(' ', '').replace('\n', '').replace('\t', '')) | |
| if non_whitespace < 30: | |
| raise ValueError('Text contains mostly whitespace') | |
| return v | |
| ``` | |
| --- | |
| ### Step 2: Update Endpoint Logic | |
| **File:** `app/api/v3/scrape_summarize.py` | |
| **Changes:** | |
| 1. Detect input type (URL vs text) | |
| 2. Branch logic accordingly | |
| 3. Adjust metadata based on input type | |
| 4. Keep streaming logic the same | |
| **Code:** | |
| ```python | |
| @router.post("/scrape-and-summarize/stream") | |
| async def scrape_and_summarize_stream( | |
| request: Request, | |
| payload: ScrapeAndSummarizeRequest | |
| ): | |
| """ | |
| Scrape article from URL OR summarize provided text. | |
| Supports two modes: | |
| 1. URL mode: Scrape article from URL then summarize | |
| 2. Text mode: Summarize provided text directly | |
| Returns: | |
| Server-Sent Events stream with metadata and content chunks | |
| """ | |
| request_id = getattr(request.state, 'request_id', 'unknown') | |
| # Determine input mode | |
| if payload.url: | |
| # URL Mode: Scrape + Summarize | |
| logger.info(f"[{request_id}] V3 URL mode: {payload.url}") | |
| scrape_start = time.time() | |
| try: | |
| article_data = await article_scraper_service.scrape_article( | |
| url=payload.url, | |
| use_cache=payload.use_cache | |
| ) | |
| except Exception as e: | |
| logger.error(f"[{request_id}] Scraping failed: {e}") | |
| raise HTTPException( | |
| status_code=502, | |
| detail=f"Failed to scrape article: {str(e)}" | |
| ) | |
| scrape_latency_ms = (time.time() - scrape_start) * 1000 | |
| logger.info(f"[{request_id}] Scraped in {scrape_latency_ms:.2f}ms, " | |
| f"extracted {len(article_data['text'])} chars") | |
| # Validate scraped content | |
| if len(article_data['text']) < 100: | |
| raise HTTPException( | |
| status_code=422, | |
| detail="Insufficient content extracted from URL. " | |
| "Article may be behind paywall or site may block scrapers." | |
| ) | |
| text_to_summarize = article_data['text'] | |
| metadata = { | |
| 'input_type': 'url', | |
| 'url': payload.url, | |
| 'title': article_data.get('title'), | |
| 'author': article_data.get('author'), | |
| 'date': article_data.get('date'), | |
| 'site_name': article_data.get('site_name'), | |
| 'scrape_method': article_data.get('method', 'static'), | |
| 'scrape_latency_ms': scrape_latency_ms, | |
| 'extracted_text_length': len(article_data['text']), | |
| } | |
| else: | |
| # Text Mode: Direct Summarization | |
| logger.info(f"[{request_id}] V3 text mode: {len(payload.text)} chars") | |
| text_to_summarize = payload.text | |
| metadata = { | |
| 'input_type': 'text', | |
| 'text_length': len(payload.text), | |
| } | |
| # Stream summarization (same for both modes) | |
| return StreamingResponse( | |
| _stream_generator(text_to_summarize, payload, metadata, request_id), | |
| media_type="text/event-stream", | |
| headers={ | |
| "Cache-Control": "no-cache", | |
| "Connection": "keep-alive", | |
| "X-Accel-Buffering": "no", | |
| "X-Request-ID": request_id, | |
| } | |
| ) | |
| async def _stream_generator(text: str, payload, metadata: dict, request_id: str): | |
| """Generate SSE stream for summarization.""" | |
| # Send metadata event first | |
| if payload.include_metadata: | |
| metadata_event = { | |
| "type": "metadata", | |
| "data": metadata | |
| } | |
| yield f"data: {json.dumps(metadata_event)}\n\n" | |
| # Stream summarization chunks | |
| summarization_start = time.time() | |
| tokens_used = 0 | |
| try: | |
| async for chunk in hf_streaming_service.summarize_text_stream( | |
| text=text, | |
| max_new_tokens=payload.max_tokens, | |
| temperature=payload.temperature, | |
| top_p=payload.top_p, | |
| prompt=payload.prompt, | |
| ): | |
| if not chunk.get('done', False): | |
| tokens_used = chunk.get('tokens_used', tokens_used) | |
| yield f"data: {json.dumps(chunk)}\n\n" | |
| except Exception as e: | |
| logger.error(f"[{request_id}] Summarization failed: {e}") | |
| error_event = { | |
| "type": "error", | |
| "error": str(e), | |
| "done": True | |
| } | |
| yield f"data: {json.dumps(error_event)}\n\n" | |
| return | |
| summarization_latency_ms = (time.time() - summarization_start) * 1000 | |
| # Calculate total latency | |
| total_latency_ms = summarization_latency_ms | |
| if metadata.get('input_type') == 'url': | |
| total_latency_ms += metadata.get('scrape_latency_ms', 0) | |
| logger.info(f"[{request_id}] V3 request completed in {total_latency_ms:.2f}ms") | |
| ``` | |
| --- | |
| ### Step 3: Update Tests | |
| **File:** `tests/test_v3_api.py` | |
| **New Test Cases:** | |
| ```python | |
| @pytest.mark.asyncio | |
| async def test_v3_text_mode_success(client): | |
| """Test V3 endpoint with text input (no scraping).""" | |
| response = await client.post( | |
| "/api/v3/scrape-and-summarize/stream", | |
| json={ | |
| "text": "This is a test article with enough content to summarize properly. " | |
| "It has multiple sentences and provides meaningful information.", | |
| "max_tokens": 128, | |
| "include_metadata": True | |
| } | |
| ) | |
| assert response.status_code == 200 | |
| assert response.headers['content-type'] == 'text/event-stream' | |
| # Parse SSE stream | |
| events = [] | |
| for line in response.text.split('\n'): | |
| if line.startswith('data: '): | |
| events.append(json.loads(line[6:])) | |
| # Check metadata event | |
| metadata_event = next(e for e in events if e.get('type') == 'metadata') | |
| assert metadata_event['data']['input_type'] == 'text' | |
| assert metadata_event['data']['text_length'] > 0 | |
| assert 'scrape_latency_ms' not in metadata_event['data'] # No scraping in text mode | |
| # Check content events exist | |
| content_events = [e for e in events if 'content' in e] | |
| assert len(content_events) > 0 | |
| @pytest.mark.asyncio | |
| async def test_v3_url_mode_success(client): | |
| """Test V3 endpoint with URL input (with scraping).""" | |
| with patch('app.services.article_scraper.article_scraper_service.scrape_article') as mock_scrape: | |
| mock_scrape.return_value = { | |
| 'text': 'Scraped article content here...', | |
| 'title': 'Test Article', | |
| 'url': 'https://example.com/test', | |
| 'method': 'static' | |
| } | |
| response = await client.post( | |
| "/api/v3/scrape-and-summarize/stream", | |
| json={ | |
| "url": "https://example.com/test", | |
| "max_tokens": 128 | |
| } | |
| ) | |
| assert response.status_code == 200 | |
| # Parse events | |
| events = [] | |
| for line in response.text.split('\n'): | |
| if line.startswith('data: '): | |
| events.append(json.loads(line[6:])) | |
| # Check metadata shows URL mode | |
| metadata_event = next(e for e in events if e.get('type') == 'metadata') | |
| assert metadata_event['data']['input_type'] == 'url' | |
| assert 'scrape_latency_ms' in metadata_event['data'] | |
| @pytest.mark.asyncio | |
| async def test_v3_missing_both_url_and_text(client): | |
| """Test validation error when neither url nor text provided.""" | |
| response = await client.post( | |
| "/api/v3/scrape-and-summarize/stream", | |
| json={ | |
| "max_tokens": 128 | |
| } | |
| ) | |
| assert response.status_code == 422 | |
| error_detail = response.json()['detail'] | |
| assert 'url' in error_detail[0]['loc'] or 'text' in error_detail[0]['loc'] | |
| @pytest.mark.asyncio | |
| async def test_v3_both_url_and_text_provided(client): | |
| """Test validation error when both url and text provided.""" | |
| response = await client.post( | |
| "/api/v3/scrape-and-summarize/stream", | |
| json={ | |
| "url": "https://example.com/test", | |
| "text": "Some text here", | |
| "max_tokens": 128 | |
| } | |
| ) | |
| assert response.status_code == 422 | |
| @pytest.mark.asyncio | |
| async def test_v3_text_too_short(client): | |
| """Test validation error for text that's too short.""" | |
| response = await client.post( | |
| "/api/v3/scrape-and-summarize/stream", | |
| json={ | |
| "text": "Too short", # Less than 50 chars | |
| "max_tokens": 128 | |
| } | |
| ) | |
| assert response.status_code == 422 | |
| assert 'too short' in response.json()['detail'][0]['msg'].lower() | |
| ``` | |
| --- | |
| ### Step 4: Update Documentation | |
| **File:** `CLAUDE.md` | |
| **Update V3 API section:** | |
| ```markdown | |
| ### V3 API (/api/v3/*): Web Scraping + Summarization | |
| **Endpoint:** POST `/api/v3/scrape-and-summarize/stream` | |
| **Supports two modes:** | |
| 1. **URL Mode** (scraping enabled): | |
| ```json | |
| { | |
| "url": "https://example.com/article", | |
| "max_tokens": 256 | |
| } | |
| ``` | |
| - Scrapes article from URL | |
| - Caches result for 1 hour | |
| - Streams summarization | |
| 2. **Text Mode** (direct summarization): | |
| ```json | |
| { | |
| "text": "Your article text here...", | |
| "max_tokens": 256 | |
| } | |
| ``` | |
| - Skips scraping | |
| - Summarizes text directly | |
| - Useful when scraping fails or text already extracted | |
| **Features:** | |
| - Intelligent input detection (URL vs text) | |
| - Backend web scraping with trafilatura | |
| - In-memory caching (URL mode only) | |
| - User-agent rotation | |
| - Metadata extraction (URL mode: title, author, date) | |
| - SSRF protection | |
| - Rate limiting | |
| **Response Format:** | |
| Same Server-Sent Events format for both modes: | |
| ``` | |
| data: {"type":"metadata","data":{"input_type":"url|text",...}} | |
| data: {"content":"token","done":false,"tokens_used":N} | |
| data: {"content":"","done":true,"latency_ms":MS} | |
| ``` | |
| ``` | |
| **File:** `README.md` | |
| **Add usage examples:** | |
| ```markdown | |
| ### V3 API Examples | |
| **Scrape and Summarize from URL:** | |
| ```bash | |
| curl -X POST "https://your-space.hf.space/api/v3/scrape-and-summarize/stream" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "url": "https://example.com/article", | |
| "max_tokens": 256, | |
| "temperature": 0.3 | |
| }' | |
| ``` | |
| **Summarize Direct Text:** | |
| ```bash | |
| curl -X POST "https://your-space.hf.space/api/v3/scrape-and-summarize/stream" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "text": "Your article text here...", | |
| "max_tokens": 256, | |
| "temperature": 0.3 | |
| }' | |
| ``` | |
| **Python Example:** | |
| ```python | |
| import requests | |
| # URL mode | |
| response = requests.post( | |
| "https://your-space.hf.space/api/v3/scrape-and-summarize/stream", | |
| json={"url": "https://example.com/article", "max_tokens": 256}, | |
| stream=True | |
| ) | |
| # Text mode | |
| response = requests.post( | |
| "https://your-space.hf.space/api/v3/scrape-and-summarize/stream", | |
| json={"text": "Article content here...", "max_tokens": 256}, | |
| stream=True | |
| ) | |
| for line in response.iter_lines(): | |
| if line.startswith(b'data: '): | |
| data = json.loads(line[6:]) | |
| if data.get('content'): | |
| print(data['content'], end='') | |
| ``` | |
| ``` | |
| --- | |
| ## Benefits of This Approach | |
| ### 1. Single Unified Endpoint | |
| - Android app uses one endpoint for everything | |
| - No need to choose between `/api/v2/` and `/api/v3/` | |
| - Simpler client-side logic | |
| ### 2. Graceful Fallback | |
| - If scraping fails (paywall, blocked), user can paste text manually | |
| - App can catch 502 errors and prompt user to provide text directly | |
| ### 3. Backward Compatible | |
| - Existing URL-based requests still work | |
| - No breaking changes for current users | |
| ### 4. Better Error Messages | |
| ```json | |
| // Missing both | |
| { | |
| "detail": [ | |
| { | |
| "type": "value_error", | |
| "msg": "Either 'url' or 'text' must be provided" | |
| } | |
| ] | |
| } | |
| // Both provided | |
| { | |
| "detail": [ | |
| { | |
| "type": "value_error", | |
| "msg": "Provide either 'url' OR 'text', not both" | |
| } | |
| ] | |
| } | |
| // Text too short | |
| { | |
| "detail": [ | |
| { | |
| "loc": ["body", "text"], | |
| "msg": "Text too short (minimum 50 characters)" | |
| } | |
| ] | |
| } | |
| ``` | |
| ### 5. Clear Metadata | |
| ```json | |
| // URL mode metadata | |
| { | |
| "type": "metadata", | |
| "data": { | |
| "input_type": "url", | |
| "url": "https://...", | |
| "title": "Article Title", | |
| "scrape_latency_ms": 450.2 | |
| } | |
| } | |
| // Text mode metadata | |
| { | |
| "type": "metadata", | |
| "data": { | |
| "input_type": "text", | |
| "text_length": 1234 | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Testing Checklist | |
| - [ ] Test URL mode with valid URL | |
| - [ ] Test text mode with valid text | |
| - [ ] Test validation: missing both url and text (expect 422) | |
| - [ ] Test validation: both url and text provided (expect 422) | |
| - [ ] Test validation: text too short (< 50 chars, expect 422) | |
| - [ ] Test validation: text too long (> 50k chars, expect 422) | |
| - [ ] Test validation: invalid URL format (expect 422) | |
| - [ ] Test SSRF protection: localhost URL (expect 422) | |
| - [ ] Test SSRF protection: private IP (expect 422) | |
| - [ ] Test metadata event in URL mode (includes scrape_latency_ms) | |
| - [ ] Test metadata event in text mode (no scrape_latency_ms) | |
| - [ ] Test streaming format same for both modes | |
| - [ ] Test cache works in URL mode | |
| - [ ] Test cache not used in text mode | |
| --- | |
| ## Deployment Steps | |
| 1. **Update Schema** (`app/api/v3/schemas.py`) | |
| - Make url Optional | |
| - Add text Optional | |
| - Add model_validator for mutual exclusivity | |
| - Update validators | |
| 2. **Update Endpoint** (`app/api/v3/scrape_summarize.py`) | |
| - Add input type detection | |
| - Branch logic for URL vs text mode | |
| - Adjust metadata | |
| 3. **Update Tests** (`tests/test_v3_api.py`) | |
| - Add text mode tests | |
| - Add validation tests | |
| - Ensure 90% coverage | |
| 4. **Update Docs** (`CLAUDE.md`, `README.md`) | |
| - Document both modes | |
| - Add examples | |
| 5. **Test Locally** | |
| ```bash | |
| pytest tests/test_v3_api.py -v | |
| ``` | |
| 6. **Deploy to HF Spaces** | |
| - Push changes | |
| - Monitor logs | |
| - Test both modes on live deployment | |
| 7. **Update Android App** | |
| - App can now send either URL or text to same endpoint | |
| - Graceful fallback: if scraping fails, prompt user for text | |
| --- | |
| ## Success Criteria | |
| β URL mode works (scraping + summarization) | |
| β Text mode works (direct summarization) | |
| β Validation errors are clear and helpful | |
| β No 422 errors when text is sent | |
| β Metadata correctly indicates input type | |
| β Tests pass with 90%+ coverage | |
| β Documentation updated | |
| β Android app can use single endpoint for both scenarios | |
| --- | |
| ## Estimated Impact | |
| - **Code Changes:** ~100 lines modified | |
| - **New Tests:** ~8 test cases | |
| - **Breaking Changes:** None (backward compatible) | |
| - **Performance:** No impact (same logic, just more flexible input) | |
| - **Memory:** No impact | |
| - **Deployment Time:** ~30 minutes | |
| --- | |
| ## Conclusion | |
| This fix transforms the V3 API from a URL-only endpoint to a **smart, dual-mode endpoint** that gracefully handles both URLs and plain text. The Android app gains flexibility without added complexity, and users get better error messages when validation fails. | |
| **Ready to implement!** π | |