ming commited on
Commit
29ed661
Β·
1 Parent(s): d5d96b7

Migrate to Ruff for linting/formatting and add comprehensive import tests

Browse files

- Replace black, isort, and flake8 with ruff (Rust-based, 10-100x faster)
- Add ruff.toml configuration with appropriate rules and ignores
- Update pre-commit hook to run ruff check --fix and ruff format before tests
- Update CLAUDE.md documentation with new ruff commands
- Fix code style issues (nested if statements, formatting)
- Add comprehensive import tests (test_imports.py) to catch import errors early
- Fix config.py to ignore extra environment variables (prevents build failures)
- Format all code with ruff (18 files reformatted)
- Auto-fix 32 linting issues
- Fix remaining test linting issues (B017, SIM117, B023)

Benefits:
- Single tool instead of three (black/isort/flake8)
- Faster pre-commit hooks
- Better CI/CD performance
- Automatic import validation before deployment

ANDROID_V4_INTEGRATION.md ADDED
@@ -0,0 +1,1877 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Android App Integration Guide for V4 Stream JSON API
2
+
3
+ > **Last Updated:** December 2024
4
+ > **API Version:** V4 (Stream JSON with Outlines)
5
+ > **Target Platform:** Android (Kotlin + Jetpack Compose)
6
+
7
+ ---
8
+
9
+ ## Table of Contents
10
+
11
+ - [Overview](#overview)
12
+ - [API Specifications](#api-specifications)
13
+ - [Data Models](#data-models)
14
+ - [Network Layer Implementation](#network-layer-implementation)
15
+ - [State Management](#state-management)
16
+ - [UI Components](#ui-components)
17
+ - [UI/UX Patterns](#uiux-patterns)
18
+ - [Error Handling](#error-handling)
19
+ - [Performance Optimization](#performance-optimization)
20
+ - [Testing Strategy](#testing-strategy)
21
+ - [Complete Example Flow](#complete-example-flow)
22
+ - [Appendix](#appendix)
23
+
24
+ ---
25
+
26
+ ## Overview
27
+
28
+ ### What is V4 Stream JSON API?
29
+
30
+ The V4 API provides **structured article summarization** with guaranteed JSON schema output. It combines:
31
+
32
+ - **Backend web scraping** (no client-side overhead)
33
+ - **Structured JSON output** (title, summary, key points, category, sentiment, read time)
34
+ - **Real-time streaming** (Server-Sent Events for progressive display)
35
+ - **Three summarization styles** (Skimmer, Executive, ELI5)
36
+
37
+ ### Key Benefits vs Client-Side Scraping
38
+
39
+ | Metric | Server-Side (V4) | Client-Side |
40
+ |--------|------------------|-------------|
41
+ | **Latency** | 2-5 seconds | 5-15 seconds |
42
+ | **Success Rate** | 95%+ | 60-70% |
43
+ | **Battery Impact** | Zero (no scraping) | High (WebView + JS) |
44
+ | **Data Usage** | ~10KB (summary only) | 500KB+ (full page) |
45
+ | **Caching** | Shared across users | Per-device only |
46
+ | **Updates** | Instant server-side | Requires app update |
47
+
48
+ ### Response Flow
49
+
50
+ ```mermaid
51
+ sequenceDiagram
52
+ participant Android as Android App
53
+ participant API as V4 API
54
+ participant Scraper as Article Scraper
55
+ participant AI as AI Model
56
+
57
+ Android->>API: POST /api/v4/scrape-and-summarize/stream-json
58
+ Note over Android,API: {"url": "...", "style": "executive"}
59
+
60
+ API->>Scraper: Scrape article
61
+ Scraper-->>API: Article text + metadata
62
+
63
+ API->>Android: SSE Event 1: Metadata
64
+ Note over Android: Display article title, author, source immediately
65
+
66
+ API->>AI: Generate structured summary
67
+
68
+ loop Streaming Tokens
69
+ AI-->>API: JSON token
70
+ API->>Android: SSE Event N: Token chunk
71
+ Note over Android: Accumulate JSON buffer
72
+ end
73
+
74
+ Android->>Android: Parse complete JSON
75
+ Note over Android: Display structured summary
76
+ ```
77
+
78
+ ---
79
+
80
+ ## API Specifications
81
+
82
+ ### Endpoint
83
+
84
+ ```
85
+ POST /api/v4/scrape-and-summarize/stream-json
86
+ ```
87
+
88
+ **Base URL:** `https://your-api.hf.space` (replace with your Hugging Face Space URL)
89
+
90
+ ### Request Schema
91
+
92
+ ```kotlin
93
+ {
94
+ "url": "https://example.com/article", // Optional: article URL (mutually exclusive with text)
95
+ "text": "Article text content...", // Optional: direct text input (mutually exclusive with url)
96
+ "style": "executive", // Required: "skimmer" | "executive" | "eli5"
97
+ "max_tokens": 1024, // Optional: 128-2048, default 1024
98
+ "include_metadata": true, // Optional: bool, default true
99
+ "use_cache": true // Optional: bool, default true
100
+ }
101
+ ```
102
+
103
+ **Validation Rules:**
104
+ - **Exactly ONE** of `url` or `text` must be provided
105
+ - `url`: Must be http/https, no localhost/private IPs, max 2000 chars
106
+ - `text`: 50-50,000 characters
107
+ - `style`: Must be one of three enum values
108
+ - `max_tokens`: Range 128-2048
109
+
110
+ ### Response Format (Server-Sent Events)
111
+
112
+ #### Event 1: Metadata (Optional)
113
+
114
+ ```json
115
+ data: {"type":"metadata","data":{"input_type":"url","url":"https://...","title":"Article Title","author":"John Doe","date":"2024-11-30","site_name":"Tech Insights","scrape_method":"static","scrape_latency_ms":425.8,"extracted_text_length":5420,"style":"executive"}}
116
+
117
+ ```
118
+
119
+ #### Events 2-N: Raw JSON Tokens
120
+
121
+ ```
122
+ data: {"title": "
123
+
124
+ data: AI Revolution 2024
125
+
126
+ data: ", "main_summary": "
127
+
128
+ data: Artificial intelligence is rapidly evolving...
129
+
130
+ data: ", "key_points": [
131
+
132
+ data: "AI is transforming technology"
133
+
134
+ data: , "ML algorithms are improving"
135
+
136
+ data: ], "category": "
137
+
138
+ data: Technology
139
+
140
+ data: ", "sentiment": "
141
+
142
+ data: positive
143
+
144
+ data: ", "read_time_min": 3}
145
+
146
+ ```
147
+
148
+ **Important:** Each line is a raw string token. Concatenate all tokens to form complete JSON.
149
+
150
+ #### Final JSON Structure
151
+
152
+ ```json
153
+ {
154
+ "title": "AI Revolution Transforms Tech Industry in 2024",
155
+ "main_summary": "Artificial intelligence is rapidly transforming technology industries with new breakthroughs in machine learning and deep learning. The latest models show unprecedented capabilities in natural language processing and computer vision.",
156
+ "key_points": [
157
+ "AI is transforming technology across industries",
158
+ "Machine learning algorithms continue improving",
159
+ "Deep learning processes massive data efficiently"
160
+ ],
161
+ "category": "Technology",
162
+ "sentiment": "positive",
163
+ "read_time_min": 3
164
+ }
165
+ ```
166
+
167
+ ### Summary Styles
168
+
169
+ | Style | Description | Tone | Use Case |
170
+ |-------|-------------|------|----------|
171
+ | **skimmer** | Quick 30-second read | Casual, concise | News browsing, quick updates |
172
+ | **executive** | Professional analysis | Formal, bullet points | Business articles, reports |
173
+ | **eli5** | Simple explanations | Friendly, easy | Complex topics, learning |
174
+
175
+ ---
176
+
177
+ ## Data Models
178
+
179
+ ### Request Models
180
+
181
+ ```kotlin
182
+ package com.example.summarizer.data.model
183
+
184
+ import kotlinx.serialization.SerialName
185
+ import kotlinx.serialization.Serializable
186
+
187
+ /**
188
+ * Request model for V4 structured summarization
189
+ *
190
+ * @property url Optional article URL (mutually exclusive with text)
191
+ * @property text Optional direct text input (mutually exclusive with url)
192
+ * @property style Summarization style: skimmer, executive, or eli5
193
+ * @property max_tokens Maximum tokens to generate (128-2048)
194
+ * @property include_metadata Include scraping metadata in response
195
+ * @property use_cache Use cached content for URLs
196
+ */
197
+ @Serializable
198
+ data class SummaryRequest(
199
+ val url: String? = null,
200
+ val text: String? = null,
201
+ val style: SummaryStyle,
202
+ @SerialName("max_tokens")
203
+ val maxTokens: Int = 1024,
204
+ @SerialName("include_metadata")
205
+ val includeMetadata: Boolean = true,
206
+ @SerialName("use_cache")
207
+ val useCache: Boolean = true
208
+ ) {
209
+ init {
210
+ require((url != null) xor (text != null)) {
211
+ "Exactly one of url or text must be provided"
212
+ }
213
+ require(maxTokens in 128..2048) {
214
+ "max_tokens must be between 128 and 2048"
215
+ }
216
+ }
217
+ }
218
+
219
+ /**
220
+ * Summarization style options
221
+ */
222
+ @Serializable
223
+ enum class SummaryStyle {
224
+ @SerialName("skimmer")
225
+ SKIMMER, // 30-second read, casual tone
226
+
227
+ @SerialName("executive")
228
+ EXECUTIVE, // Professional, bullet points
229
+
230
+ @SerialName("eli5")
231
+ ELI5 // Simple, easy-to-understand
232
+ }
233
+ ```
234
+
235
+ ### Response Models
236
+
237
+ ```kotlin
238
+ /**
239
+ * Metadata event sent as first SSE event
240
+ */
241
+ @Serializable
242
+ data class MetadataEvent(
243
+ val type: String, // Always "metadata"
244
+ val data: ScrapingMetadata
245
+ )
246
+
247
+ /**
248
+ * Scraping metadata from article extraction
249
+ */
250
+ @Serializable
251
+ data class ScrapingMetadata(
252
+ @SerialName("input_type")
253
+ val inputType: String, // "url" or "text"
254
+
255
+ val url: String? = null,
256
+ val title: String? = null,
257
+ val author: String? = null,
258
+ val date: String? = null,
259
+
260
+ @SerialName("site_name")
261
+ val siteName: String? = null,
262
+
263
+ @SerialName("scrape_method")
264
+ val scrapeMethod: String? = null, // "static"
265
+
266
+ @SerialName("scrape_latency_ms")
267
+ val scrapeLatencyMs: Double? = null,
268
+
269
+ @SerialName("extracted_text_length")
270
+ val extractedTextLength: Int? = null,
271
+
272
+ val style: String
273
+ )
274
+
275
+ /**
276
+ * Final structured summary output
277
+ */
278
+ @Serializable
279
+ data class StructuredSummary(
280
+ val title: String, // 6-10 words, click-worthy title
281
+
282
+ @SerialName("main_summary")
283
+ val mainSummary: String, // 2-4 sentences
284
+
285
+ @SerialName("key_points")
286
+ val keyPoints: List<String>, // 3-5 bullet points, 8-12 words each
287
+
288
+ val category: String, // 1-2 words (e.g., "Tech", "Politics")
289
+
290
+ val sentiment: String, // "positive", "negative", or "neutral"
291
+
292
+ @SerialName("read_time_min")
293
+ val readTimeMin: Int // Estimated reading time (minutes)
294
+ )
295
+ ```
296
+
297
+ ### UI State Models
298
+
299
+ ```kotlin
300
+ /**
301
+ * UI state for summary screen
302
+ */
303
+ sealed class SummaryState {
304
+ /**
305
+ * Initial state, no request made
306
+ */
307
+ object Idle : SummaryState()
308
+
309
+ /**
310
+ * Loading state with progress message
311
+ */
312
+ data class Loading(val progress: String) : SummaryState()
313
+
314
+ /**
315
+ * Metadata received from first SSE event
316
+ */
317
+ data class MetadataReceived(val metadata: ScrapingMetadata) : SummaryState()
318
+
319
+ /**
320
+ * Streaming JSON tokens in progress
321
+ */
322
+ data class Streaming(
323
+ val metadata: ScrapingMetadata?,
324
+ val tokensReceived: Int
325
+ ) : SummaryState()
326
+
327
+ /**
328
+ * Summary generation complete
329
+ */
330
+ data class Success(
331
+ val metadata: ScrapingMetadata?,
332
+ val summary: StructuredSummary
333
+ ) : SummaryState()
334
+
335
+ /**
336
+ * Error occurred during processing
337
+ */
338
+ data class Error(val message: String) : SummaryState()
339
+ }
340
+
341
+ /**
342
+ * Events emitted during streaming
343
+ */
344
+ sealed class SummaryEvent {
345
+ data class Metadata(val metadata: ScrapingMetadata) : SummaryEvent()
346
+ data class TokensReceived(val totalChars: Int) : SummaryEvent()
347
+ data class Complete(val summary: StructuredSummary) : SummaryEvent()
348
+ data class Error(val message: String) : SummaryEvent()
349
+ }
350
+ ```
351
+
352
+ ---
353
+
354
+ ## Network Layer Implementation
355
+
356
+ ### Dependencies (build.gradle.kts)
357
+
358
+ ```kotlin
359
+ dependencies {
360
+ // OkHttp for SSE streaming
361
+ implementation("com.squareup.okhttp3:okhttp:4.12.0")
362
+
363
+ // Kotlin serialization
364
+ implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.6.0")
365
+
366
+ // Coroutines
367
+ implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3")
368
+
369
+ // Hilt for dependency injection
370
+ implementation("com.google.dagger:hilt-android:2.48")
371
+ kapt("com.google.dagger:hilt-compiler:2.48")
372
+ }
373
+ ```
374
+
375
+ ### Repository Implementation
376
+
377
+ ```kotlin
378
+ package com.example.summarizer.data.repository
379
+
380
+ import kotlinx.coroutines.channels.awaitClose
381
+ import kotlinx.coroutines.flow.Flow
382
+ import kotlinx.coroutines.flow.callbackFlow
383
+ import kotlinx.serialization.json.Json
384
+ import kotlinx.serialization.encodeToString
385
+ import kotlinx.serialization.decodeFromString
386
+ import okhttp3.Call
387
+ import okhttp3.Callback
388
+ import okhttp3.MediaType.Companion.toMediaType
389
+ import okhttp3.OkHttpClient
390
+ import okhttp3.Request
391
+ import okhttp3.RequestBody.Companion.toRequestBody
392
+ import okhttp3.Response
393
+ import java.io.IOException
394
+ import java.net.SocketTimeoutException
395
+ import java.net.UnknownHostException
396
+ import java.util.concurrent.TimeUnit
397
+ import javax.inject.Inject
398
+ import javax.inject.Singleton
399
+
400
+ /**
401
+ * Repository for V4 structured summarization API
402
+ */
403
+ @Singleton
404
+ class SummarizeRepository @Inject constructor(
405
+ private val okHttpClient: OkHttpClient,
406
+ private val json: Json,
407
+ private val baseUrl: String = "https://your-api.hf.space" // Inject via Hilt
408
+ ) {
409
+
410
+ /**
411
+ * Stream structured summary from URL or text
412
+ *
413
+ * @param request Summary request with URL or text
414
+ * @return Flow of SummaryEvent (Metadata, TokensReceived, Complete, Error)
415
+ */
416
+ fun streamSummary(request: SummaryRequest): Flow<SummaryEvent> = callbackFlow {
417
+ // Serialize request to JSON
418
+ val requestBody = json.encodeToString(request).toRequestBody(
419
+ "application/json".toMediaType()
420
+ )
421
+
422
+ // Build HTTP request
423
+ val httpRequest = Request.Builder()
424
+ .url("$baseUrl/api/v4/scrape-and-summarize/stream-json")
425
+ .post(requestBody)
426
+ .build()
427
+
428
+ val call = okHttpClient.newCall(httpRequest)
429
+
430
+ try {
431
+ // Execute synchronous request (blocking)
432
+ val response = call.execute()
433
+
434
+ // Check for HTTP errors
435
+ if (!response.isSuccessful) {
436
+ trySend(SummaryEvent.Error("HTTP ${response.code}: ${response.message}"))
437
+ close()
438
+ return@callbackFlow
439
+ }
440
+
441
+ // Get response body source
442
+ val source = response.body?.source() ?: run {
443
+ trySend(SummaryEvent.Error("Empty response body"))
444
+ close()
445
+ return@callbackFlow
446
+ }
447
+
448
+ // SSE parsing state
449
+ val jsonBuffer = StringBuilder()
450
+ var metadataSent = false
451
+
452
+ // Read SSE stream line by line
453
+ while (!source.exhausted()) {
454
+ val line = source.readUtf8Line() ?: break
455
+
456
+ // Parse SSE format: "data: <content>"
457
+ if (line.startsWith("data: ")) {
458
+ val data = line.substring(6) // Remove "data: " prefix
459
+
460
+ // Try parsing as metadata event (first event only)
461
+ if (!metadataSent) {
462
+ try {
463
+ val metadataEvent = json.decodeFromString<MetadataEvent>(data)
464
+ if (metadataEvent.type == "metadata") {
465
+ trySend(SummaryEvent.Metadata(metadataEvent.data))
466
+ metadataSent = true
467
+ continue
468
+ }
469
+ } catch (e: Exception) {
470
+ // Not metadata, treat as JSON token
471
+ }
472
+ }
473
+
474
+ // Accumulate JSON tokens
475
+ jsonBuffer.append(data)
476
+ trySend(SummaryEvent.TokensReceived(jsonBuffer.length))
477
+ }
478
+ }
479
+
480
+ // Parse complete JSON
481
+ val completeJson = jsonBuffer.toString()
482
+ if (completeJson.isNotBlank()) {
483
+ try {
484
+ val summary = json.decodeFromString<StructuredSummary>(completeJson)
485
+ trySend(SummaryEvent.Complete(summary))
486
+ } catch (e: Exception) {
487
+ trySend(SummaryEvent.Error("JSON parsing failed: ${e.message}"))
488
+ }
489
+ } else {
490
+ trySend(SummaryEvent.Error("No JSON received"))
491
+ }
492
+
493
+ } catch (e: SocketTimeoutException) {
494
+ trySend(SummaryEvent.Error("Request timed out. Try a shorter article."))
495
+ } catch (e: UnknownHostException) {
496
+ trySend(SummaryEvent.Error("No internet connection"))
497
+ } catch (e: IOException) {
498
+ trySend(SummaryEvent.Error("Network error: ${e.message}"))
499
+ } catch (e: Exception) {
500
+ trySend(SummaryEvent.Error(e.message ?: "Unknown error"))
501
+ } finally {
502
+ call.cancel()
503
+ }
504
+
505
+ awaitClose { call.cancel() }
506
+ }
507
+ }
508
+ ```
509
+
510
+ ### OkHttp Configuration (Hilt Module)
511
+
512
+ ```kotlin
513
+ package com.example.summarizer.di
514
+
515
+ import dagger.Module
516
+ import dagger.Provides
517
+ import dagger.hilt.InstallIn
518
+ import dagger.hilt.components.SingletonComponent
519
+ import kotlinx.serialization.json.Json
520
+ import okhttp3.ConnectionPool
521
+ import okhttp3.OkHttpClient
522
+ import java.util.concurrent.TimeUnit
523
+ import javax.inject.Singleton
524
+
525
+ @Module
526
+ @InstallIn(SingletonComponent::class)
527
+ object NetworkModule {
528
+
529
+ @Provides
530
+ @Singleton
531
+ fun provideOkHttpClient(): OkHttpClient {
532
+ return OkHttpClient.Builder()
533
+ .connectionPool(
534
+ ConnectionPool(
535
+ maxIdleConnections = 5,
536
+ keepAliveDuration = 5,
537
+ TimeUnit.MINUTES
538
+ )
539
+ )
540
+ .readTimeout(600, TimeUnit.SECONDS) // Long timeout for streaming
541
+ .connectTimeout(30, TimeUnit.SECONDS)
542
+ .writeTimeout(30, TimeUnit.SECONDS)
543
+ .build()
544
+ }
545
+
546
+ @Provides
547
+ @Singleton
548
+ fun provideJson(): Json {
549
+ return Json {
550
+ ignoreUnknownKeys = true
551
+ isLenient = true
552
+ prettyPrint = false
553
+ }
554
+ }
555
+
556
+ @Provides
557
+ @Singleton
558
+ fun provideBaseUrl(): String {
559
+ return "https://your-api.hf.space" // Replace with your API URL
560
+ }
561
+ }
562
+ ```
563
+
564
+ ---
565
+
566
+ ## State Management
567
+
568
+ ### ViewModel Implementation
569
+
570
+ ```kotlin
571
+ package com.example.summarizer.ui.viewmodel
572
+
573
+ import androidx.lifecycle.ViewModel
574
+ import androidx.lifecycle.viewModelScope
575
+ import com.example.summarizer.data.model.*
576
+ import com.example.summarizer.data.repository.SummarizeRepository
577
+ import dagger.hilt.android.lifecycle.HiltViewModel
578
+ import kotlinx.coroutines.flow.MutableStateFlow
579
+ import kotlinx.coroutines.flow.StateFlow
580
+ import kotlinx.coroutines.flow.asStateFlow
581
+ import kotlinx.coroutines.launch
582
+ import javax.inject.Inject
583
+
584
+ /**
585
+ * ViewModel for summary screen
586
+ */
587
+ @HiltViewModel
588
+ class SummaryViewModel @Inject constructor(
589
+ private val repository: SummarizeRepository
590
+ ) : ViewModel() {
591
+
592
+ private val _state = MutableStateFlow<SummaryState>(SummaryState.Idle)
593
+ val state: StateFlow<SummaryState> = _state.asStateFlow()
594
+
595
+ /**
596
+ * Summarize article from URL
597
+ *
598
+ * @param url Article URL to summarize
599
+ * @param style Summarization style
600
+ */
601
+ fun summarizeUrl(url: String, style: SummaryStyle) {
602
+ viewModelScope.launch {
603
+ _state.value = SummaryState.Loading("Fetching article...")
604
+
605
+ repository.streamSummary(
606
+ SummaryRequest(
607
+ url = url,
608
+ style = style,
609
+ includeMetadata = true
610
+ )
611
+ ).collect { event ->
612
+ handleEvent(event)
613
+ }
614
+ }
615
+ }
616
+
617
+ /**
618
+ * Summarize text directly
619
+ *
620
+ * @param text Text content to summarize
621
+ * @param style Summarization style
622
+ */
623
+ fun summarizeText(text: String, style: SummaryStyle) {
624
+ viewModelScope.launch {
625
+ _state.value = SummaryState.Loading("Generating summary...")
626
+
627
+ repository.streamSummary(
628
+ SummaryRequest(
629
+ text = text,
630
+ style = style,
631
+ includeMetadata = false
632
+ )
633
+ ).collect { event ->
634
+ handleEvent(event)
635
+ }
636
+ }
637
+ }
638
+
639
+ /**
640
+ * Handle streaming events and update state
641
+ */
642
+ private fun handleEvent(event: SummaryEvent) {
643
+ when (event) {
644
+ is SummaryEvent.Metadata -> {
645
+ _state.value = SummaryState.MetadataReceived(event.metadata)
646
+ }
647
+
648
+ is SummaryEvent.TokensReceived -> {
649
+ val currentState = _state.value
650
+ val metadata = when (currentState) {
651
+ is SummaryState.MetadataReceived -> currentState.metadata
652
+ is SummaryState.Streaming -> currentState.metadata
653
+ else -> null
654
+ }
655
+ _state.value = SummaryState.Streaming(
656
+ metadata = metadata,
657
+ tokensReceived = event.totalChars
658
+ )
659
+ }
660
+
661
+ is SummaryEvent.Complete -> {
662
+ val metadata = when (val currentState = _state.value) {
663
+ is SummaryState.MetadataReceived -> currentState.metadata
664
+ is SummaryState.Streaming -> currentState.metadata
665
+ else -> null
666
+ }
667
+ _state.value = SummaryState.Success(
668
+ metadata = metadata,
669
+ summary = event.summary
670
+ )
671
+ }
672
+
673
+ is SummaryEvent.Error -> {
674
+ _state.value = SummaryState.Error(event.message)
675
+ }
676
+ }
677
+ }
678
+
679
+ /**
680
+ * Reset state to idle
681
+ */
682
+ fun reset() {
683
+ _state.value = SummaryState.Idle
684
+ }
685
+ }
686
+ ```
687
+
688
+ ---
689
+
690
+ ## UI Components
691
+
692
+ ### Main Summary Screen
693
+
694
+ ```kotlin
695
+ package com.example.summarizer.ui.screen
696
+
697
+ import androidx.compose.foundation.layout.*
698
+ import androidx.compose.foundation.lazy.LazyColumn
699
+ import androidx.compose.material3.*
700
+ import androidx.compose.runtime.*
701
+ import androidx.compose.ui.Modifier
702
+ import androidx.compose.ui.unit.dp
703
+ import androidx.hilt.navigation.compose.hiltViewModel
704
+ import com.example.summarizer.data.model.SummaryStyle
705
+ import com.example.summarizer.ui.viewmodel.SummaryViewModel
706
+
707
+ @Composable
708
+ fun SummaryScreen(
709
+ viewModel: SummaryViewModel = hiltViewModel()
710
+ ) {
711
+ val state by viewModel.state.collectAsState()
712
+
713
+ Column(
714
+ modifier = Modifier
715
+ .fillMaxSize()
716
+ .padding(16.dp)
717
+ ) {
718
+ // URL Input Section
719
+ UrlInputSection(
720
+ onSummarize = { url, style ->
721
+ viewModel.summarizeUrl(url, style)
722
+ }
723
+ )
724
+
725
+ Spacer(modifier = Modifier.height(16.dp))
726
+
727
+ // Summary Content
728
+ when (val currentState = state) {
729
+ SummaryState.Idle -> {
730
+ EmptyStateView()
731
+ }
732
+
733
+ is SummaryState.Loading -> {
734
+ LoadingView(message = currentState.progress)
735
+ }
736
+
737
+ is SummaryState.MetadataReceived -> {
738
+ MetadataCard(metadata = currentState.metadata)
739
+ Spacer(modifier = Modifier.height(8.dp))
740
+ LoadingView(message = "Generating summary...")
741
+ }
742
+
743
+ is SummaryState.Streaming -> {
744
+ currentState.metadata?.let {
745
+ MetadataCard(it)
746
+ Spacer(modifier = Modifier.height(8.dp))
747
+ }
748
+ StreamingIndicator(tokensReceived = currentState.tokensReceived)
749
+ }
750
+
751
+ is SummaryState.Success -> {
752
+ SummaryContent(
753
+ metadata = currentState.metadata,
754
+ summary = currentState.summary
755
+ )
756
+ }
757
+
758
+ is SummaryState.Error -> {
759
+ ErrorView(
760
+ message = currentState.message,
761
+ onRetry = { viewModel.reset() }
762
+ )
763
+ }
764
+ }
765
+ }
766
+ }
767
+ ```
768
+
769
+ ### URL Input Section
770
+
771
+ ```kotlin
772
+ @Composable
773
+ fun UrlInputSection(
774
+ onSummarize: (String, SummaryStyle) -> Unit
775
+ ) {
776
+ var url by remember { mutableStateOf("") }
777
+ var selectedStyle by remember { mutableStateOf(SummaryStyle.EXECUTIVE) }
778
+
779
+ Column(
780
+ modifier = Modifier.fillMaxWidth(),
781
+ verticalArrangement = Arrangement.spacedBy(12.dp)
782
+ ) {
783
+ Text(
784
+ text = "Summarize Article",
785
+ style = MaterialTheme.typography.headlineMedium
786
+ )
787
+
788
+ OutlinedTextField(
789
+ value = url,
790
+ onValueChange = { url = it },
791
+ label = { Text("Article URL") },
792
+ placeholder = { Text("https://example.com/article") },
793
+ modifier = Modifier.fillMaxWidth(),
794
+ singleLine = true
795
+ )
796
+
797
+ StyleSelector(
798
+ selectedStyle = selectedStyle,
799
+ onStyleSelected = { selectedStyle = it }
800
+ )
801
+
802
+ Button(
803
+ onClick = { onSummarize(url, selectedStyle) },
804
+ modifier = Modifier.fillMaxWidth(),
805
+ enabled = url.isNotBlank()
806
+ ) {
807
+ Text("Summarize")
808
+ }
809
+ }
810
+ }
811
+
812
+ @Composable
813
+ fun StyleSelector(
814
+ selectedStyle: SummaryStyle,
815
+ onStyleSelected: (SummaryStyle) -> Unit
816
+ ) {
817
+ Column(
818
+ verticalArrangement = Arrangement.spacedBy(8.dp)
819
+ ) {
820
+ Text(
821
+ text = "Summary Style",
822
+ style = MaterialTheme.typography.labelLarge
823
+ )
824
+
825
+ Row(
826
+ modifier = Modifier.fillMaxWidth(),
827
+ horizontalArrangement = Arrangement.spacedBy(8.dp)
828
+ ) {
829
+ StyleChip(
830
+ label = "Quick (30s)",
831
+ description = "Skimmer",
832
+ isSelected = selectedStyle == SummaryStyle.SKIMMER,
833
+ onClick = { onStyleSelected(SummaryStyle.SKIMMER) },
834
+ modifier = Modifier.weight(1f)
835
+ )
836
+
837
+ StyleChip(
838
+ label = "Professional",
839
+ description = "Executive",
840
+ isSelected = selectedStyle == SummaryStyle.EXECUTIVE,
841
+ onClick = { onStyleSelected(SummaryStyle.EXECUTIVE) },
842
+ modifier = Modifier.weight(1f)
843
+ )
844
+
845
+ StyleChip(
846
+ label = "Simple",
847
+ description = "ELI5",
848
+ isSelected = selectedStyle == SummaryStyle.ELI5,
849
+ onClick = { onStyleSelected(SummaryStyle.ELI5) },
850
+ modifier = Modifier.weight(1f)
851
+ )
852
+ }
853
+ }
854
+ }
855
+
856
+ @Composable
857
+ fun StyleChip(
858
+ label: String,
859
+ description: String,
860
+ isSelected: Boolean,
861
+ onClick: () -> Unit,
862
+ modifier: Modifier = Modifier
863
+ ) {
864
+ FilterChip(
865
+ selected = isSelected,
866
+ onClick = onClick,
867
+ label = {
868
+ Column {
869
+ Text(
870
+ text = label,
871
+ style = MaterialTheme.typography.labelMedium
872
+ )
873
+ Text(
874
+ text = description,
875
+ style = MaterialTheme.typography.bodySmall
876
+ )
877
+ }
878
+ },
879
+ modifier = modifier
880
+ )
881
+ }
882
+ ```
883
+
884
+ ### Metadata Card
885
+
886
+ ```kotlin
887
+ @Composable
888
+ fun MetadataCard(metadata: ScrapingMetadata) {
889
+ Card(
890
+ modifier = Modifier.fillMaxWidth(),
891
+ colors = CardDefaults.cardColors(
892
+ containerColor = MaterialTheme.colorScheme.surfaceVariant
893
+ )
894
+ ) {
895
+ Column(
896
+ modifier = Modifier.padding(16.dp),
897
+ verticalArrangement = Arrangement.spacedBy(8.dp)
898
+ ) {
899
+ // Article Title
900
+ metadata.title?.let {
901
+ Text(
902
+ text = it,
903
+ style = MaterialTheme.typography.titleMedium,
904
+ fontWeight = FontWeight.Bold
905
+ )
906
+ }
907
+
908
+ // Metadata Row
909
+ Row(
910
+ modifier = Modifier.fillMaxWidth(),
911
+ horizontalArrangement = Arrangement.SpaceBetween
912
+ ) {
913
+ // Author & Date
914
+ Column {
915
+ metadata.author?.let {
916
+ Text(
917
+ text = "By $it",
918
+ style = MaterialTheme.typography.bodySmall
919
+ )
920
+ }
921
+ metadata.date?.let {
922
+ Text(
923
+ text = it,
924
+ style = MaterialTheme.typography.bodySmall,
925
+ color = MaterialTheme.colorScheme.onSurfaceVariant
926
+ )
927
+ }
928
+ }
929
+
930
+ // Source & Length
931
+ Column(horizontalAlignment = Alignment.End) {
932
+ metadata.siteName?.let {
933
+ Text(
934
+ text = it,
935
+ style = MaterialTheme.typography.bodySmall
936
+ )
937
+ }
938
+ metadata.extractedTextLength?.let {
939
+ Text(
940
+ text = "${it / 1000}K chars",
941
+ style = MaterialTheme.typography.bodySmall,
942
+ color = MaterialTheme.colorScheme.onSurfaceVariant
943
+ )
944
+ }
945
+ }
946
+ }
947
+ }
948
+ }
949
+ }
950
+ ```
951
+
952
+ ### Summary Content (Final Result)
953
+
954
+ ```kotlin
955
+ @Composable
956
+ fun SummaryContent(
957
+ metadata: ScrapingMetadata?,
958
+ summary: StructuredSummary
959
+ ) {
960
+ LazyColumn(
961
+ modifier = Modifier.fillMaxSize(),
962
+ verticalArrangement = Arrangement.spacedBy(16.dp)
963
+ ) {
964
+ // Metadata
965
+ metadata?.let {
966
+ item { MetadataCard(it) }
967
+ }
968
+
969
+ // Summary Header with Category, Sentiment, Read Time
970
+ item {
971
+ Row(
972
+ modifier = Modifier.fillMaxWidth(),
973
+ horizontalArrangement = Arrangement.SpaceBetween,
974
+ verticalAlignment = Alignment.CenterVertically
975
+ ) {
976
+ // Category Chip
977
+ AssistChip(
978
+ onClick = { },
979
+ label = { Text(summary.category) },
980
+ leadingIcon = {
981
+ Icon(
982
+ imageVector = getCategoryIcon(summary.category),
983
+ contentDescription = null
984
+ )
985
+ }
986
+ )
987
+
988
+ // Sentiment Badge
989
+ SentimentBadge(sentiment = summary.sentiment)
990
+
991
+ // Read Time
992
+ Row(verticalAlignment = Alignment.CenterVertically) {
993
+ Icon(
994
+ imageVector = Icons.Default.Schedule,
995
+ contentDescription = null,
996
+ modifier = Modifier.size(16.dp)
997
+ )
998
+ Spacer(modifier = Modifier.width(4.dp))
999
+ Text(
1000
+ text = "${summary.readTimeMin} min read",
1001
+ style = MaterialTheme.typography.bodySmall
1002
+ )
1003
+ }
1004
+ }
1005
+ }
1006
+
1007
+ // Title
1008
+ item {
1009
+ Text(
1010
+ text = summary.title,
1011
+ style = MaterialTheme.typography.headlineSmall,
1012
+ fontWeight = FontWeight.Bold
1013
+ )
1014
+ }
1015
+
1016
+ // Main Summary
1017
+ item {
1018
+ Card(
1019
+ modifier = Modifier.fillMaxWidth(),
1020
+ colors = CardDefaults.cardColors(
1021
+ containerColor = MaterialTheme.colorScheme.primaryContainer
1022
+ )
1023
+ ) {
1024
+ Text(
1025
+ text = summary.mainSummary,
1026
+ style = MaterialTheme.typography.bodyLarge,
1027
+ modifier = Modifier.padding(16.dp)
1028
+ )
1029
+ }
1030
+ }
1031
+
1032
+ // Key Points Section
1033
+ item {
1034
+ Text(
1035
+ text = "Key Points",
1036
+ style = MaterialTheme.typography.titleMedium,
1037
+ fontWeight = FontWeight.Bold
1038
+ )
1039
+ }
1040
+
1041
+ itemsIndexed(summary.keyPoints) { index, point ->
1042
+ KeyPointItem(index = index + 1, point = point)
1043
+ }
1044
+
1045
+ // Action Buttons
1046
+ item {
1047
+ Row(
1048
+ modifier = Modifier.fillMaxWidth(),
1049
+ horizontalArrangement = Arrangement.spacedBy(8.dp)
1050
+ ) {
1051
+ OutlinedButton(
1052
+ onClick = { /* Share */ },
1053
+ modifier = Modifier.weight(1f)
1054
+ ) {
1055
+ Icon(Icons.Default.Share, contentDescription = null)
1056
+ Spacer(modifier = Modifier.width(8.dp))
1057
+ Text("Share")
1058
+ }
1059
+
1060
+ Button(
1061
+ onClick = { /* Save */ },
1062
+ modifier = Modifier.weight(1f)
1063
+ ) {
1064
+ Icon(Icons.Default.BookmarkBorder, contentDescription = null)
1065
+ Spacer(modifier = Modifier.width(8.dp))
1066
+ Text("Save")
1067
+ }
1068
+ }
1069
+ }
1070
+ }
1071
+ }
1072
+
1073
+ @Composable
1074
+ fun KeyPointItem(index: Int, point: String) {
1075
+ Row(
1076
+ modifier = Modifier
1077
+ .fillMaxWidth()
1078
+ .padding(vertical = 8.dp)
1079
+ ) {
1080
+ // Numbered Badge
1081
+ Surface(
1082
+ shape = CircleShape,
1083
+ color = MaterialTheme.colorScheme.primary,
1084
+ modifier = Modifier.size(24.dp)
1085
+ ) {
1086
+ Box(contentAlignment = Alignment.Center) {
1087
+ Text(
1088
+ text = "$index",
1089
+ style = MaterialTheme.typography.labelSmall,
1090
+ color = MaterialTheme.colorScheme.onPrimary
1091
+ )
1092
+ }
1093
+ }
1094
+
1095
+ Spacer(modifier = Modifier.width(12.dp))
1096
+
1097
+ Text(
1098
+ text = point,
1099
+ style = MaterialTheme.typography.bodyMedium,
1100
+ modifier = Modifier.weight(1f)
1101
+ )
1102
+ }
1103
+ }
1104
+
1105
+ @Composable
1106
+ fun SentimentBadge(sentiment: String) {
1107
+ val (color, icon) = when (sentiment.lowercase()) {
1108
+ "positive" -> MaterialTheme.colorScheme.primary to Icons.Default.TrendingUp
1109
+ "negative" -> MaterialTheme.colorScheme.error to Icons.Default.TrendingDown
1110
+ else -> MaterialTheme.colorScheme.outline to Icons.Default.TrendingFlat
1111
+ }
1112
+
1113
+ AssistChip(
1114
+ onClick = { },
1115
+ label = { Text(sentiment.replaceFirstChar { it.uppercase() }) },
1116
+ leadingIcon = {
1117
+ Icon(
1118
+ imageVector = icon,
1119
+ contentDescription = null,
1120
+ tint = color
1121
+ )
1122
+ },
1123
+ colors = AssistChipDefaults.assistChipColors(
1124
+ leadingIconContentColor = color
1125
+ )
1126
+ )
1127
+ }
1128
+ ```
1129
+
1130
+ ### Loading and Error Views
1131
+
1132
+ ```kotlin
1133
+ @Composable
1134
+ fun LoadingView(message: String) {
1135
+ Column(
1136
+ modifier = Modifier
1137
+ .fillMaxWidth()
1138
+ .padding(32.dp),
1139
+ horizontalAlignment = Alignment.CenterHorizontally,
1140
+ verticalArrangement = Arrangement.spacedBy(16.dp)
1141
+ ) {
1142
+ CircularProgressIndicator()
1143
+ Text(
1144
+ text = message,
1145
+ style = MaterialTheme.typography.bodyMedium,
1146
+ color = MaterialTheme.colorScheme.onSurfaceVariant
1147
+ )
1148
+ }
1149
+ }
1150
+
1151
+ @Composable
1152
+ fun StreamingIndicator(tokensReceived: Int) {
1153
+ Column(
1154
+ modifier = Modifier
1155
+ .fillMaxWidth()
1156
+ .padding(16.dp),
1157
+ horizontalAlignment = Alignment.CenterHorizontally,
1158
+ verticalArrangement = Arrangement.spacedBy(12.dp)
1159
+ ) {
1160
+ LinearProgressIndicator(modifier = Modifier.fillMaxWidth())
1161
+ Text(
1162
+ text = "Generating summary... ($tokensReceived characters)",
1163
+ style = MaterialTheme.typography.bodyMedium,
1164
+ color = MaterialTheme.colorScheme.onSurfaceVariant
1165
+ )
1166
+ }
1167
+ }
1168
+
1169
+ @Composable
1170
+ fun ErrorView(message: String, onRetry: () -> Unit) {
1171
+ Column(
1172
+ modifier = Modifier
1173
+ .fillMaxWidth()
1174
+ .padding(16.dp),
1175
+ horizontalAlignment = Alignment.CenterHorizontally,
1176
+ verticalArrangement = Arrangement.spacedBy(16.dp)
1177
+ ) {
1178
+ Icon(
1179
+ imageVector = Icons.Default.ErrorOutline,
1180
+ contentDescription = null,
1181
+ tint = MaterialTheme.colorScheme.error,
1182
+ modifier = Modifier.size(48.dp)
1183
+ )
1184
+
1185
+ Text(
1186
+ text = "Unable to generate summary",
1187
+ style = MaterialTheme.typography.titleMedium,
1188
+ fontWeight = FontWeight.Bold
1189
+ )
1190
+
1191
+ Text(
1192
+ text = message,
1193
+ style = MaterialTheme.typography.bodyMedium,
1194
+ color = MaterialTheme.colorScheme.onSurfaceVariant,
1195
+ textAlign = TextAlign.Center
1196
+ )
1197
+
1198
+ Button(onClick = onRetry) {
1199
+ Icon(Icons.Default.Refresh, contentDescription = null)
1200
+ Spacer(modifier = Modifier.width(8.dp))
1201
+ Text("Try Again")
1202
+ }
1203
+ }
1204
+ }
1205
+
1206
+ @Composable
1207
+ fun EmptyStateView() {
1208
+ Column(
1209
+ modifier = Modifier
1210
+ .fillMaxWidth()
1211
+ .padding(32.dp),
1212
+ horizontalAlignment = Alignment.CenterHorizontally,
1213
+ verticalArrangement = Arrangement.spacedBy(16.dp)
1214
+ ) {
1215
+ Icon(
1216
+ imageVector = Icons.Default.Article,
1217
+ contentDescription = null,
1218
+ modifier = Modifier.size(64.dp),
1219
+ tint = MaterialTheme.colorScheme.primary
1220
+ )
1221
+
1222
+ Text(
1223
+ text = "Enter a URL to get started",
1224
+ style = MaterialTheme.typography.titleMedium
1225
+ )
1226
+
1227
+ Text(
1228
+ text = "Paste any article URL and choose your preferred summary style",
1229
+ style = MaterialTheme.typography.bodyMedium,
1230
+ color = MaterialTheme.colorScheme.onSurfaceVariant,
1231
+ textAlign = TextAlign.Center
1232
+ )
1233
+ }
1234
+ }
1235
+ ```
1236
+
1237
+ ---
1238
+
1239
+ ## UI/UX Patterns
1240
+
1241
+ ### Progressive Loading Flow
1242
+
1243
+ ```mermaid
1244
+ stateDiagram-v2
1245
+ [*] --> Idle: Initial State
1246
+ Idle --> Loading: User taps "Summarize"
1247
+ Loading --> MetadataReceived: 2-3 seconds
1248
+ MetadataReceived --> Streaming: Start receiving JSON
1249
+ Streaming --> Success: Complete
1250
+
1251
+ Loading --> Error: Network/API Error
1252
+ MetadataReceived --> Error: Timeout
1253
+ Streaming --> Error: Parse Error
1254
+
1255
+ Error --> Idle: Reset/Retry
1256
+ Success --> Idle: New Request
1257
+ ```
1258
+
1259
+ ### Recommended UX Timeline
1260
+
1261
+ | Time | State | UI Display |
1262
+ |------|-------|------------|
1263
+ | 0s | Loading | Show spinner: "Fetching article..." |
1264
+ | 2s | MetadataReceived | Display article title, author, source |
1265
+ | 2-5s | Streaming | Show progress: "Generating summary... (150 chars)" |
1266
+ | 5s | Success | Fade in complete structured summary |
1267
+
1268
+ ### Animation Recommendations
1269
+
1270
+ ```kotlin
1271
+ // Fade in summary content
1272
+ LaunchedEffect(key1 = state) {
1273
+ if (state is SummaryState.Success) {
1274
+ // Animate key points appearing one by one
1275
+ summary.keyPoints.forEachIndexed { index, _ ->
1276
+ delay(100 * index.toLong())
1277
+ // Trigger recomposition to show next point
1278
+ }
1279
+ }
1280
+ }
1281
+
1282
+ // Shimmer effect for metadata card while loading
1283
+ @Composable
1284
+ fun ShimmerMetadataCard() {
1285
+ val infiniteTransition = rememberInfiniteTransition()
1286
+ val alpha by infiniteTransition.animateFloat(
1287
+ initialValue = 0.3f,
1288
+ targetValue = 0.7f,
1289
+ animationSpec = infiniteRepeatable(
1290
+ animation = tween(1000),
1291
+ repeatMode = RepeatMode.Reverse
1292
+ )
1293
+ )
1294
+
1295
+ Card(
1296
+ modifier = Modifier.fillMaxWidth(),
1297
+ colors = CardDefaults.cardColors(
1298
+ containerColor = MaterialTheme.colorScheme.surfaceVariant.copy(alpha = alpha)
1299
+ )
1300
+ ) {
1301
+ // Placeholder content
1302
+ }
1303
+ }
1304
+ ```
1305
+
1306
+ ---
1307
+
1308
+ ## Error Handling
1309
+
1310
+ ### HTTP Error Mapping
1311
+
1312
+ | HTTP Code | Meaning | User-Friendly Message |
1313
+ |-----------|---------|----------------------|
1314
+ | 400 | Bad Request | "Invalid request. Please check your input." |
1315
+ | 422 | Validation Error | "Invalid URL or text format. Please try again." |
1316
+ | 429 | Rate Limited | "Too many requests. Please wait a moment and try again." |
1317
+ | 500 | Server Error | "Service temporarily unavailable. Please try again later." |
1318
+ | 502 | Bad Gateway | "Unable to access article. Try a different URL." |
1319
+ | 504 | Gateway Timeout | "Request took too long. Try a shorter article or different URL." |
1320
+
1321
+ ### Network Error Handling
1322
+
1323
+ ```kotlin
1324
+ sealed class NetworkError {
1325
+ data class HttpError(val code: Int, val message: String) : NetworkError()
1326
+ data class ConnectionError(val message: String) : NetworkError()
1327
+ data class TimeoutError(val message: String) : NetworkError()
1328
+ data class ParseError(val message: String) : NetworkError()
1329
+ data class UnknownError(val message: String) : NetworkError()
1330
+ }
1331
+
1332
+ fun Throwable.toUserFriendlyMessage(): String {
1333
+ return when (this) {
1334
+ is SocketTimeoutException -> "Request timed out. Try a shorter article."
1335
+ is UnknownHostException -> "No internet connection. Please check your network."
1336
+ is IOException -> "Network error. Please check your connection."
1337
+ is kotlinx.serialization.SerializationException -> "Invalid response from server. Please try again."
1338
+ else -> message ?: "An unexpected error occurred."
1339
+ }
1340
+ }
1341
+ ```
1342
+
1343
+ ### Error Retry Logic
1344
+
1345
+ ```kotlin
1346
+ class SummarizeRepositoryWithRetry(
1347
+ private val baseRepository: SummarizeRepository,
1348
+ private val maxRetries: Int = 3,
1349
+ private val retryDelayMs: Long = 1000
1350
+ ) {
1351
+ fun streamSummaryWithRetry(request: SummaryRequest): Flow<SummaryEvent> = flow {
1352
+ var currentAttempt = 0
1353
+ var lastError: Throwable? = null
1354
+
1355
+ while (currentAttempt < maxRetries) {
1356
+ try {
1357
+ baseRepository.streamSummary(request).collect { event ->
1358
+ emit(event)
1359
+ if (event is SummaryEvent.Complete) {
1360
+ return@flow // Success, exit
1361
+ }
1362
+ }
1363
+ return@flow // Completed successfully
1364
+ } catch (e: Exception) {
1365
+ lastError = e
1366
+ currentAttempt++
1367
+
1368
+ if (currentAttempt < maxRetries) {
1369
+ delay(retryDelayMs * currentAttempt) // Exponential backoff
1370
+ }
1371
+ }
1372
+ }
1373
+
1374
+ // All retries failed
1375
+ emit(SummaryEvent.Error(lastError?.toUserFriendlyMessage() ?: "Unknown error"))
1376
+ }
1377
+ }
1378
+ ```
1379
+
1380
+ ---
1381
+
1382
+ ## Performance Optimization
1383
+
1384
+ ### 1. Response Caching
1385
+
1386
+ ```kotlin
1387
+ package com.example.summarizer.data.cache
1388
+
1389
+ import android.util.LruCache
1390
+ import com.example.summarizer.data.model.StructuredSummary
1391
+ import javax.inject.Inject
1392
+ import javax.inject.Singleton
1393
+
1394
+ /**
1395
+ * In-memory cache for summaries
1396
+ */
1397
+ @Singleton
1398
+ class SummaryCache @Inject constructor() {
1399
+ private val cache = LruCache<String, CachedSummary>(50) // Cache up to 50 summaries
1400
+
1401
+ fun get(key: String): StructuredSummary? {
1402
+ return cache.get(key)?.takeIf { it.isValid() }?.summary
1403
+ }
1404
+
1405
+ fun put(key: String, summary: StructuredSummary) {
1406
+ cache.put(key, CachedSummary(summary, System.currentTimeMillis()))
1407
+ }
1408
+
1409
+ fun clear() {
1410
+ cache.evictAll()
1411
+ }
1412
+ }
1413
+
1414
+ data class CachedSummary(
1415
+ val summary: StructuredSummary,
1416
+ val timestamp: Long,
1417
+ val ttlMs: Long = 3600_000 // 1 hour TTL
1418
+ ) {
1419
+ fun isValid(): Boolean {
1420
+ return System.currentTimeMillis() - timestamp < ttlMs
1421
+ }
1422
+ }
1423
+ ```
1424
+
1425
+ ### 2. Repository with Caching
1426
+
1427
+ ```kotlin
1428
+ @Singleton
1429
+ class CachedSummarizeRepository @Inject constructor(
1430
+ private val baseRepository: SummarizeRepository,
1431
+ private val cache: SummaryCache
1432
+ ) {
1433
+ fun streamSummary(request: SummaryRequest): Flow<SummaryEvent> = flow {
1434
+ // Generate cache key
1435
+ val cacheKey = request.url ?: request.text?.take(100)
1436
+
1437
+ // Check cache first (for URLs only)
1438
+ if (request.url != null && cacheKey != null) {
1439
+ val cached = cache.get(cacheKey)
1440
+ if (cached != null) {
1441
+ emit(SummaryEvent.Complete(cached))
1442
+ return@flow
1443
+ }
1444
+ }
1445
+
1446
+ // Cache miss, stream from API
1447
+ baseRepository.streamSummary(request).collect { event ->
1448
+ emit(event)
1449
+
1450
+ // Cache successful results
1451
+ if (event is SummaryEvent.Complete && cacheKey != null) {
1452
+ cache.put(cacheKey, event.summary)
1453
+ }
1454
+ }
1455
+ }
1456
+ }
1457
+ ```
1458
+
1459
+ ### 3. Connection Pooling
1460
+
1461
+ Already configured in `NetworkModule.provideOkHttpClient()`:
1462
+
1463
+ ```kotlin
1464
+ ConnectionPool(
1465
+ maxIdleConnections = 5,
1466
+ keepAliveDuration = 5,
1467
+ TimeUnit.MINUTES
1468
+ )
1469
+ ```
1470
+
1471
+ ### 4. Lazy Loading
1472
+
1473
+ Display metadata immediately while summary streams - makes app feel 2-3x faster:
1474
+
1475
+ ```kotlin
1476
+ // In ViewModel
1477
+ when (event) {
1478
+ is SummaryEvent.Metadata -> {
1479
+ // Show metadata card immediately (2s latency)
1480
+ _state.value = SummaryState.MetadataReceived(event.metadata)
1481
+ }
1482
+ is SummaryEvent.Complete -> {
1483
+ // Show summary after streaming complete (5s total latency)
1484
+ _state.value = SummaryState.Success(...)
1485
+ }
1486
+ }
1487
+ ```
1488
+
1489
+ ---
1490
+
1491
+ ## Testing Strategy
1492
+
1493
+ ### Unit Tests
1494
+
1495
+ ```kotlin
1496
+ package com.example.summarizer.ui.viewmodel
1497
+
1498
+ import app.cash.turbine.test
1499
+ import com.example.summarizer.data.model.*
1500
+ import com.example.summarizer.data.repository.SummarizeRepository
1501
+ import io.mockk.*
1502
+ import kotlinx.coroutines.flow.flowOf
1503
+ import kotlinx.coroutines.test.runTest
1504
+ import org.junit.Before
1505
+ import org.junit.Test
1506
+ import kotlin.test.assertEquals
1507
+ import kotlin.test.assertTrue
1508
+
1509
+ class SummaryViewModelTest {
1510
+
1511
+ private lateinit var repository: SummarizeRepository
1512
+ private lateinit var viewModel: SummaryViewModel
1513
+
1514
+ @Before
1515
+ fun setup() {
1516
+ repository = mockk()
1517
+ viewModel = SummaryViewModel(repository)
1518
+ }
1519
+
1520
+ @Test
1521
+ fun `metadata received before summary completes`() = runTest {
1522
+ // Given
1523
+ val metadata = ScrapingMetadata(
1524
+ inputType = "url",
1525
+ title = "Test Article",
1526
+ style = "executive"
1527
+ )
1528
+ val summary = StructuredSummary(
1529
+ title = "Test",
1530
+ mainSummary = "Summary",
1531
+ keyPoints = listOf("Point 1"),
1532
+ category = "Tech",
1533
+ sentiment = "positive",
1534
+ readTimeMin = 3
1535
+ )
1536
+
1537
+ coEvery { repository.streamSummary(any()) } returns flowOf(
1538
+ SummaryEvent.Metadata(metadata),
1539
+ SummaryEvent.TokensReceived(50),
1540
+ SummaryEvent.Complete(summary)
1541
+ )
1542
+
1543
+ // When
1544
+ viewModel.summarizeUrl("https://test.com", SummaryStyle.EXECUTIVE)
1545
+
1546
+ // Then
1547
+ viewModel.state.test {
1548
+ assertEquals(SummaryState.Loading::class, awaitItem()::class)
1549
+ assertEquals(SummaryState.MetadataReceived::class, awaitItem()::class)
1550
+ assertEquals(SummaryState.Streaming::class, awaitItem()::class)
1551
+
1552
+ val successState = awaitItem()
1553
+ assertTrue(successState is SummaryState.Success)
1554
+ assertEquals(metadata, successState.metadata)
1555
+ assertEquals(summary, successState.summary)
1556
+ }
1557
+ }
1558
+
1559
+ @Test
1560
+ fun `error handling displays error message`() = runTest {
1561
+ // Given
1562
+ coEvery { repository.streamSummary(any()) } returns flowOf(
1563
+ SummaryEvent.Error("Network error")
1564
+ )
1565
+
1566
+ // When
1567
+ viewModel.summarizeUrl("https://test.com", SummaryStyle.EXECUTIVE)
1568
+
1569
+ // Then
1570
+ viewModel.state.test {
1571
+ assertEquals(SummaryState.Loading::class, awaitItem()::class)
1572
+
1573
+ val errorState = awaitItem()
1574
+ assertTrue(errorState is SummaryState.Error)
1575
+ assertEquals("Network error", errorState.message)
1576
+ }
1577
+ }
1578
+ }
1579
+ ```
1580
+
1581
+ ### Integration Tests
1582
+
1583
+ ```kotlin
1584
+ package com.example.summarizer.data.repository
1585
+
1586
+ import kotlinx.coroutines.flow.toList
1587
+ import kotlinx.coroutines.test.runTest
1588
+ import kotlinx.serialization.json.Json
1589
+ import okhttp3.OkHttpClient
1590
+ import okhttp3.mockwebserver.MockResponse
1591
+ import okhttp3.mockwebserver.MockWebServer
1592
+ import org.junit.After
1593
+ import org.junit.Before
1594
+ import org.junit.Test
1595
+ import kotlin.test.assertEquals
1596
+ import kotlin.test.assertTrue
1597
+
1598
+ class SummarizeRepositoryIntegrationTest {
1599
+
1600
+ private lateinit var mockWebServer: MockWebServer
1601
+ private lateinit var repository: SummarizeRepository
1602
+
1603
+ @Before
1604
+ fun setup() {
1605
+ mockWebServer = MockWebServer()
1606
+ mockWebServer.start()
1607
+
1608
+ repository = SummarizeRepository(
1609
+ okHttpClient = OkHttpClient(),
1610
+ json = Json { ignoreUnknownKeys = true },
1611
+ baseUrl = mockWebServer.url("/").toString()
1612
+ )
1613
+ }
1614
+
1615
+ @After
1616
+ fun tearDown() {
1617
+ mockWebServer.shutdown()
1618
+ }
1619
+
1620
+ @Test
1621
+ fun `streaming JSON is parsed correctly`() = runTest {
1622
+ // Given
1623
+ val mockResponse = MockResponse()
1624
+ .setResponseCode(200)
1625
+ .setBody("""
1626
+ data: {"type":"metadata","data":{"input_type":"url","title":"Test","style":"executive"}}
1627
+
1628
+ data: {"title":"
1629
+
1630
+ data: Test Article
1631
+
1632
+ data: ","main_summary":"
1633
+
1634
+ data: This is a test
1635
+
1636
+ data: ","key_points":["Point 1"],"category":"Tech","sentiment":"positive","read_time_min":3}
1637
+
1638
+ """.trimIndent())
1639
+ mockWebServer.enqueue(mockResponse)
1640
+
1641
+ // When
1642
+ val request = SummaryRequest(
1643
+ url = "https://test.com",
1644
+ style = SummaryStyle.EXECUTIVE
1645
+ )
1646
+ val events = repository.streamSummary(request).toList()
1647
+
1648
+ // Then
1649
+ assertEquals(3, events.size)
1650
+
1651
+ assertTrue(events[0] is SummaryEvent.Metadata)
1652
+ assertTrue(events[1] is SummaryEvent.TokensReceived)
1653
+ assertTrue(events[2] is SummaryEvent.Complete)
1654
+
1655
+ val completeEvent = events[2] as SummaryEvent.Complete
1656
+ assertEquals("Test Article", completeEvent.summary.title)
1657
+ assertEquals("Tech", completeEvent.summary.category)
1658
+ }
1659
+ }
1660
+ ```
1661
+
1662
+ ---
1663
+
1664
+ ## Complete Example Flow
1665
+
1666
+ ### User Journey
1667
+
1668
+ ```
1669
+ 1. User opens app
1670
+ └─> Display EmptyStateView with instructions
1671
+
1672
+ 2. User enters URL: "https://example.com/ai-revolution"
1673
+ └─> Enable "Summarize" button
1674
+
1675
+ 3. User selects style: "Executive"
1676
+ └─> Highlight selected chip
1677
+
1678
+ 4. User taps "Summarize"
1679
+ β”œβ”€> [0-2s] Show LoadingView: "Fetching article..."
1680
+ β”‚ └─> Display CircularProgressIndicator
1681
+ β”‚
1682
+ β”œβ”€> [2s] Receive metadata event
1683
+ β”‚ └─> Show MetadataCard with:
1684
+ β”‚ - Title: "AI Revolution Transforms Tech Industry"
1685
+ β”‚ - Author: "John Doe"
1686
+ β”‚ - Source: "Tech Insights"
1687
+ β”‚ - Date: "2024-11-30"
1688
+ β”‚ - Length: "5.4K chars"
1689
+ β”‚ └─> Show LoadingView: "Generating summary..."
1690
+ β”‚
1691
+ β”œβ”€> [2-5s] Stream JSON tokens
1692
+ β”‚ └─> Update StreamingIndicator: "Generating summary... (150 chars)"
1693
+ β”‚ └─> Increment progress as tokens arrive
1694
+ β”‚
1695
+ └─> [5s] Summary complete
1696
+ └─> Fade in SummaryContent:
1697
+ β”œβ”€> Category chip: "Technology" (with icon)
1698
+ β”œβ”€> Sentiment badge: "Positive" (green, trending up icon)
1699
+ β”œβ”€> Read time: "3 min read"
1700
+ β”œβ”€> Title: "AI Revolution Transforms Tech Industry in 2024"
1701
+ β”œβ”€> Main summary card (blue background):
1702
+ β”‚ "Artificial intelligence is rapidly transforming..."
1703
+ β”œβ”€> Key points section:
1704
+ β”‚ 1. AI is transforming technology across industries
1705
+ β”‚ 2. Machine learning algorithms continue improving
1706
+ β”‚ 3. Deep learning processes massive data efficiently
1707
+ └─> Action buttons: [Share] [Save]
1708
+
1709
+ 5. User taps "Share"
1710
+ └─> Open share sheet with formatted summary text
1711
+
1712
+ 6. User taps "Save"
1713
+ └─> Save to local database for offline access
1714
+ ```
1715
+
1716
+ ---
1717
+
1718
+ ## Appendix
1719
+
1720
+ ### A. Icon Mapping Helper
1721
+
1722
+ ```kotlin
1723
+ import androidx.compose.material.icons.Icons
1724
+ import androidx.compose.material.icons.filled.*
1725
+ import androidx.compose.ui.graphics.vector.ImageVector
1726
+
1727
+ fun getCategoryIcon(category: String): ImageVector {
1728
+ return when (category.lowercase()) {
1729
+ "tech", "technology" -> Icons.Default.Computer
1730
+ "business", "finance" -> Icons.Default.Business
1731
+ "politics", "government" -> Icons.Default.Gavel
1732
+ "sports" -> Icons.Default.Sports
1733
+ "health", "medical" -> Icons.Default.LocalHospital
1734
+ "science" -> Icons.Default.Science
1735
+ "entertainment" -> Icons.Default.Theaters
1736
+ "education" -> Icons.Default.School
1737
+ else -> Icons.Default.Article
1738
+ }
1739
+ }
1740
+ ```
1741
+
1742
+ ### B. Share Functionality
1743
+
1744
+ ```kotlin
1745
+ fun shareSummary(context: Context, summary: StructuredSummary, metadata: ScrapingMetadata?) {
1746
+ val shareText = buildString {
1747
+ appendLine(summary.title)
1748
+ appendLine()
1749
+ appendLine(summary.mainSummary)
1750
+ appendLine()
1751
+ appendLine("Key Points:")
1752
+ summary.keyPoints.forEachIndexed { index, point ->
1753
+ appendLine("${index + 1}. $point")
1754
+ }
1755
+ appendLine()
1756
+ appendLine("Category: ${summary.category}")
1757
+ appendLine("Read time: ${summary.readTimeMin} min")
1758
+ metadata?.url?.let {
1759
+ appendLine()
1760
+ appendLine("Source: $it")
1761
+ }
1762
+ appendLine()
1763
+ appendLine("Summarized with [App Name]")
1764
+ }
1765
+
1766
+ val sendIntent = Intent().apply {
1767
+ action = Intent.ACTION_SEND
1768
+ putExtra(Intent.EXTRA_TEXT, shareText)
1769
+ type = "text/plain"
1770
+ }
1771
+
1772
+ val shareIntent = Intent.createChooser(sendIntent, "Share Summary")
1773
+ context.startActivity(shareIntent)
1774
+ }
1775
+ ```
1776
+
1777
+ ### C. Environment Configuration
1778
+
1779
+ ```kotlin
1780
+ // local.properties (not committed to git)
1781
+ BASE_URL=https://your-api.hf.space
1782
+
1783
+ // build.gradle.kts
1784
+ android {
1785
+ defaultConfig {
1786
+ val properties = Properties()
1787
+ properties.load(project.rootProject.file("local.properties").inputStream())
1788
+
1789
+ buildConfigField(
1790
+ "String",
1791
+ "BASE_URL",
1792
+ "\"${properties.getProperty("BASE_URL")}\""
1793
+ )
1794
+ }
1795
+ }
1796
+
1797
+ // Usage in NetworkModule
1798
+ @Provides
1799
+ @Singleton
1800
+ fun provideBaseUrl(): String {
1801
+ return BuildConfig.BASE_URL
1802
+ }
1803
+ ```
1804
+
1805
+ ### D. Proguard Rules
1806
+
1807
+ ```proguard
1808
+ # OkHttp
1809
+ -dontwarn okhttp3.**
1810
+ -keep class okhttp3.** { *; }
1811
+
1812
+ # Kotlinx Serialization
1813
+ -keepattributes *Annotation*, InnerClasses
1814
+ -dontnote kotlinx.serialization.AnnotationsKt
1815
+ -keepclassmembers class kotlinx.serialization.json.** {
1816
+ *** Companion;
1817
+ }
1818
+ -keepclasseswithmembers class kotlinx.serialization.json.** {
1819
+ kotlinx.serialization.KSerializer serializer(...);
1820
+ }
1821
+ -keep,includedescriptorclasses class com.example.summarizer.**$$serializer { *; }
1822
+ -keepclassmembers class com.example.summarizer.** {
1823
+ *** Companion;
1824
+ }
1825
+ -keepclasseswithmembers class com.example.summarizer.** {
1826
+ kotlinx.serialization.KSerializer serializer(...);
1827
+ }
1828
+ ```
1829
+
1830
+ ### E. Performance Monitoring
1831
+
1832
+ ```kotlin
1833
+ // Add timing metrics to track performance
1834
+ class MetricsRepository @Inject constructor() {
1835
+ fun trackSummaryLatency(
1836
+ url: String,
1837
+ scrapeLatencyMs: Double?,
1838
+ totalLatencyMs: Long
1839
+ ) {
1840
+ // Send to analytics (Firebase, etc.)
1841
+ FirebaseAnalytics.getInstance(context).logEvent("summary_completed") {
1842
+ param("url_domain", Uri.parse(url).host ?: "unknown")
1843
+ param("scrape_latency_ms", scrapeLatencyMs ?: 0.0)
1844
+ param("total_latency_ms", totalLatencyMs.toDouble())
1845
+ }
1846
+ }
1847
+ }
1848
+ ```
1849
+
1850
+ ---
1851
+
1852
+ ## Summary
1853
+
1854
+ This guide provides everything needed to integrate the V4 Stream JSON API into your Android app:
1855
+
1856
+ **Key Takeaways:**
1857
+ 1. **Use OkHttp** for SSE streaming with long timeouts (600s)
1858
+ 2. **Parse in two phases**: Metadata first β†’ accumulate JSON tokens β†’ parse complete JSON
1859
+ 3. **Progressive UI**: Show metadata immediately (2s), summary follows (5s total)
1860
+ 4. **Structured display**: Leverage category, sentiment, read time for rich UI
1861
+ 5. **Error resilience**: Handle network errors, timeouts, malformed JSON gracefully
1862
+ 6. **Performance**: Cache summaries locally, reuse connections, lazy load UI
1863
+
1864
+ **Performance Gains:**
1865
+ - 2-5s server-side vs 5-15s client-side
1866
+ - 95%+ success rate vs 60-70% on mobile
1867
+ - Zero battery drain from scraping
1868
+ - ~10KB data usage vs 500KB+ full article
1869
+
1870
+ **Next Steps:**
1871
+ 1. Replace `https://your-api.hf.space` with your actual API URL
1872
+ 2. Implement share and save functionality
1873
+ 3. Add analytics tracking
1874
+ 4. Test with real articles
1875
+ 5. Optimize UI animations and transitions
1876
+
1877
+ For questions or issues, refer to the [main API documentation](CLAUDE.md) or contact the backend team.
CLAUDE.md CHANGED
@@ -25,12 +25,14 @@ pytest --cov=app --cov-report=html:htmlcov
25
 
26
  ### Code Quality
27
  ```bash
 
 
 
28
  # Format code
29
- black app/
30
- isort app/
31
 
32
- # Lint code
33
- flake8 app/
34
  ```
35
 
36
  ### Running Locally
 
25
 
26
  ### Code Quality
27
  ```bash
28
+ # Lint code (with auto-fix)
29
+ ruff check --fix app/
30
+
31
  # Format code
32
+ ruff format app/
 
33
 
34
+ # Run both linting and formatting
35
+ ruff check --fix app/ && ruff format app/
36
  ```
37
 
38
  ### Running Locally
V4_LOCAL_SETUP.md ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # V4 Local Setup for M4 MacBook Pro
2
+
3
+ ## Summary
4
+
5
+ V4 is successfully configured and running on your M4 MacBook Pro with **MPS (Metal Performance Shaders)** acceleration!
6
+
7
+ ## Hardware Configuration
8
+
9
+ - **Model**: M4 MacBook Pro (Mac16,7)
10
+ - **CPU**: 14 cores (10 performance + 4 efficiency)
11
+ - **Memory**: 24GB unified memory
12
+ - **GPU**: Apple Silicon with MPS support
13
+ - **OS**: macOS 26.1
14
+
15
+ ## V4 Configuration (.env)
16
+
17
+ ```bash
18
+ # V4 Structured JSON API Configuration (Outlines)
19
+ ENABLE_V4_STRUCTURED=true
20
+ ENABLE_V4_WARMUP=true
21
+
22
+ # V4 Model Configuration
23
+ V4_MODEL_ID=Qwen/Qwen2.5-1.5B-Instruct
24
+ V4_MAX_TOKENS=256
25
+ V4_TEMPERATURE=0.2
26
+
27
+ # V4 Performance Optimization (M4 MacBook Pro with MPS)
28
+ V4_USE_FP16_FOR_SPEED=true
29
+ V4_ENABLE_QUANTIZATION=false
30
+ ```
31
+
32
+ ## Performance Metrics
33
+
34
+ ### Model Loading
35
+ - **Device**: `mps:0` (Metal Performance Shaders)
36
+ - **Dtype**: `torch.float16` (FP16 for speed)
37
+ - **Quantization**: FP16 (MPS, fast mode)
38
+ - **Load time**: ~5 seconds
39
+ - **Warmup time**: ~22 seconds
40
+ - **Memory usage**: ~2-3GB unified memory
41
+
42
+ ### Inference Performance
43
+ - **Expected speed**: 2-5 seconds per request
44
+ - **Token generation**: ~10-20 tokens/sec
45
+ - **Device utilization**: GPU accelerated via MPS
46
+
47
+ ## Starting the Server
48
+
49
+ ```bash
50
+ # Start V4-enabled server
51
+ conda run -n summarizer uvicorn app.main:app --host 0.0.0.0 --port 7860
52
+
53
+ # Server will warmup V4 on startup (takes 20-30s)
54
+ # Look for these log messages:
55
+ # βœ… V4 model initialized successfully
56
+ # Model device: mps:0
57
+ # Torch dtype: torch.float16
58
+ ```
59
+
60
+ ## Testing V4
61
+
62
+ ### Via curl
63
+
64
+ ```bash
65
+ # Test V4 stream-json endpoint (Outlines-constrained)
66
+ curl -X POST http://localhost:7860/api/v4/scrape-and-summarize/stream-json \
67
+ -H "Content-Type: application/json" \
68
+ -d '{
69
+ "text": "Your article text here...",
70
+ "style": "executive",
71
+ "max_tokens": 256
72
+ }'
73
+ ```
74
+
75
+ ### Via Python
76
+
77
+ ```python
78
+ import requests
79
+
80
+ url = "http://localhost:7860/api/v4/scrape-and-summarize/stream-json"
81
+ payload = {
82
+ "text": "Your article text here...",
83
+ "style": "executive", # Options: skimmer, executive, eli5
84
+ "max_tokens": 256
85
+ }
86
+
87
+ response = requests.post(url, json=payload, stream=True)
88
+
89
+ for line in response.iter_lines():
90
+ if line:
91
+ print(line.decode('utf-8'))
92
+ ```
93
+
94
+ ## V4 Endpoints
95
+
96
+ 1. **`/api/v4/scrape-and-summarize/stream`** - Raw JSON token streaming
97
+ 2. **`/api/v4/scrape-and-summarize/stream-ndjson`** - NDJSON patch streaming (best for Android)
98
+ 3. **`/api/v4/scrape-and-summarize/stream-json`** - Outlines-constrained JSON (most reliable schema)
99
+
100
+ ## Structured Output Format
101
+
102
+ V4 guarantees the following JSON structure:
103
+
104
+ ```json
105
+ {
106
+ "title": "6-10 word headline",
107
+ "main_summary": "2-4 sentence summary",
108
+ "key_points": [
109
+ "Key point 1",
110
+ "Key point 2",
111
+ "Key point 3"
112
+ ],
113
+ "category": "1-2 word topic label",
114
+ "sentiment": "positive|negative|neutral",
115
+ "read_time_min": 3
116
+ }
117
+ ```
118
+
119
+ ## Summarization Styles
120
+
121
+ 1. **`skimmer`** - Quick facts and highlights for fast reading
122
+ 2. **`executive`** - Business-focused summary with key takeaways (recommended)
123
+ 3. **`eli5`** - "Explain Like I'm 5" - simple, accessible explanations
124
+
125
+ ## Code Changes Made
126
+
127
+ ### 1. Added MPS Detection (`app/services/structured_summarizer.py`)
128
+
129
+ ```python
130
+ # Detect both CUDA and MPS
131
+ use_cuda = torch.cuda.is_available()
132
+ use_mps = torch.backends.mps.is_available() and torch.backends.mps.is_built()
133
+
134
+ if use_cuda:
135
+ logger.info("CUDA is available. Using GPU for V4 model.")
136
+ elif use_mps:
137
+ logger.info("MPS (Metal Performance Shaders) is available. Using Apple Silicon GPU for V4 model.")
138
+ else:
139
+ logger.info("No GPU available. V4 model will run on CPU.")
140
+ ```
141
+
142
+ ### 2. Fixed Model Loading for MPS
143
+
144
+ ```python
145
+ # MPS requires explicit device setting, not device_map
146
+ if use_mps:
147
+ self.model = AutoModelForCausalLM.from_pretrained(
148
+ settings.v4_model_id,
149
+ torch_dtype=torch.float16, # Fixed: was `dtype=` (incorrect)
150
+ cache_dir=settings.hf_cache_dir,
151
+ trust_remote_code=True,
152
+ ).to("mps") # Explicit MPS device
153
+ ```
154
+
155
+ ### 3. Added FP16 Support for MPS
156
+
157
+ ```python
158
+ elif (use_cuda or use_mps) and use_fp16_for_speed:
159
+ device_str = "CUDA GPU" if use_cuda else "MPS (Apple Silicon)"
160
+ logger.info(f"Loading V4 model in FP16 for maximum speed on {device_str}...")
161
+ # ... FP16 loading logic
162
+ ```
163
+
164
+ ## Known Issues
165
+
166
+ ### Outlines JSON Generation Reliability
167
+
168
+ The Outlines library (0.1.1) with Qwen 1.5B sometimes generates malformed JSON with extra characters. This is a known limitation of constrained decoding with smaller models.
169
+
170
+ **Symptoms**:
171
+ ```
172
+ ValidationError: Extra data: line 1 column 278 (char 277)
173
+ input_value='{"title":"Apple Announce...":5}#RRR!!##R!R!R##!#!!'
174
+ ```
175
+
176
+ **Workarounds**:
177
+ 1. Use the `/stream` or `/stream-ndjson` endpoints instead (more reliable)
178
+ 2. Retry failed requests (Outlines generation is non-deterministic)
179
+ 3. Consider using a larger model (Qwen 3B) for better JSON reliability
180
+ 4. Use lower temperature (already set to 0.2 for stability)
181
+
182
+ ### Memory Considerations
183
+
184
+ - **Current usage**: ~2-3GB unified memory for V4
185
+ - **Total with all services**: ~4-5GB (V2 + V3 + V4)
186
+ - **Your 24GB Mac**: Plenty of headroom βœ…
187
+
188
+ ## Performance Comparison
189
+
190
+ | Version | Device | Memory | Inference Time | Use Case |
191
+ |---------|--------|---------|----------------|----------|
192
+ | V1 | Ollama | ~2-4GB | 2-5s | Local custom models |
193
+ | V2 | CPU/GPU | ~500MB | Streaming | Fast free-form summaries |
194
+ | V3 | CPU/GPU | ~550MB | 2-5s | Web scraping + summarization |
195
+ | **V4** | **MPS** | **~2-3GB** | **2-5s** | **Structured JSON output** |
196
+
197
+ ## Next Steps
198
+
199
+ ### For Production Use
200
+
201
+ 1. **Test with real articles**: Feed V4 actual articles from your Android app
202
+ 2. **Monitor memory**: Use Activity Monitor to track memory usage
203
+ 3. **Benchmark performance**: Measure actual inference times under load
204
+ 4. **Consider alternatives if Outlines is unreliable**:
205
+ - Switch to `/stream-ndjson` endpoint (more reliable, progressive updates)
206
+ - Use post-processing to clean JSON output
207
+ - Upgrade to a larger model (Qwen 3B or Phi-3-Mini 3.8B)
208
+
209
+ ### For Development
210
+
211
+ 1. **Disable V4 warmup when not testing**:
212
+ ```bash
213
+ ENABLE_V4_WARMUP=false # Saves 20-30s startup time
214
+ ```
215
+
216
+ 2. **Run only V4** (disable V1/V2/V3 to save memory):
217
+ ```bash
218
+ ENABLE_V1_WARMUP=false
219
+ ENABLE_V2_WARMUP=false
220
+ ENABLE_V3_SCRAPING=false
221
+ ```
222
+
223
+ 3. **Experiment with temperature**:
224
+ ```bash
225
+ V4_TEMPERATURE=0.1 # Even more deterministic (may be too rigid)
226
+ V4_TEMPERATURE=0.3 # More creative (may reduce schema compliance)
227
+ ```
228
+
229
+ ## Troubleshooting
230
+
231
+ ### Model not loading on MPS
232
+
233
+ Check PyTorch MPS support:
234
+ ```bash
235
+ conda run -n summarizer python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}')"
236
+ ```
237
+
238
+ ### Server startup fails
239
+
240
+ Check the logs:
241
+ ```bash
242
+ conda run -n summarizer uvicorn app.main:app --host 0.0.0.0 --port 7860
243
+ # Look for "βœ… V4 model initialized successfully"
244
+ ```
245
+
246
+ ### JSON validation errors
247
+
248
+ This is expected with Qwen 1.5B + Outlines. Consider:
249
+ - Using `/stream-ndjson` endpoint
250
+ - Implementing retry logic
251
+ - Using a larger model
252
+
253
+ ## Resources
254
+
255
+ - **Model**: [Qwen/Qwen2.5-1.5B-Instruct on HuggingFace](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
256
+ - **Outlines**: [Outlines 0.1.1 Documentation](https://outlines-dev.github.io/outlines/)
257
+ - **PyTorch MPS**: [Apple Silicon GPU Acceleration](https://pytorch.org/docs/stable/notes/mps.html)
258
+
259
+ ## Success Indicators
260
+
261
+ βœ… **Model loads on MPS** (`mps:0`)
262
+ βœ… **FP16 dtype enabled** (`torch.float16`)
263
+ βœ… **Fast loading** (~5 seconds)
264
+ βœ… **Memory efficient** (~2-3GB)
265
+ βœ… **Inference working** (generates output)
266
+ ⚠️ **Outlines reliability** (known issue with Qwen 1.5B)
267
+
268
+ ---
269
+
270
+ **Status**: V4 is fully operational on your M4 MacBook Pro! πŸŽ‰
V4_TESTING_LEARNINGS.md ADDED
@@ -0,0 +1,254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # V4 API Testing & Model Comparison - Key Learnings
2
+
3
+ ## Overview
4
+ This document summarizes the key learnings from testing the V4 structured summarization API with different models (Qwen 1.5B vs 3B) and endpoints (NDJSON vs Outlines JSON).
5
+
6
+ ---
7
+
8
+ ## 🎯 Key Findings
9
+
10
+ ### 1. **Endpoint Performance Comparison**
11
+
12
+ #### NDJSON Endpoint (`/stream-ndjson`)
13
+ - **Speed**: ~26 seconds (43% faster than Outlines JSON)
14
+ - **Advantages**:
15
+ - Progressive streaming updates (8+ patches)
16
+ - First content arrives in ~1-2 seconds
17
+ - No garbage character cleanup needed
18
+ - Better UX for Android app (real-time UI updates)
19
+ - **Disadvantages**:
20
+ - Streaming implementation had issues with 3B model
21
+ - Requires proper SSE parsing (`data: ` prefix handling)
22
+
23
+ #### Outlines JSON Endpoint (`/stream-json`)
24
+ - **Speed**: ~46 seconds (with 1.5B model)
25
+ - **Advantages**:
26
+ - Guaranteed schema compliance
27
+ - Works reliably with both 1.5B and 3B models
28
+ - Single final JSON response
29
+ - **Disadvantages**:
30
+ - Slower (constrained decoding overhead)
31
+ - Requires garbage character cleanup (55+ chars removed)
32
+ - No progressive updates (all-or-nothing)
33
+ - First content arrives after ~22 seconds
34
+
35
+ **Winner**: NDJSON for speed and UX, but Outlines JSON for reliability
36
+
37
+ ---
38
+
39
+ ### 2. **Model Quality Comparison**
40
+
41
+ #### Qwen 2.5-1.5B-Instruct (Original)
42
+ - **Performance**: 20-46 seconds per request
43
+ - **Memory**: ~2-3GB unified memory
44
+ - **Quality Issues**:
45
+ - Repetitive titles/summaries
46
+ - Incomplete sentences
47
+ - Lower factual accuracy
48
+ - Less coherent key points
49
+ - Example: "Water pipeline risk assessment issue" (generic)
50
+ - **Speed**: Fastest option
51
+
52
+ #### Qwen 2.5-3B-Instruct (Upgraded)
53
+ - **Performance**: 40-60 seconds per request (~2x slower)
54
+ - **Memory**: ~6-7GB unified memory
55
+ - **Quality Improvements**:
56
+ - Better titles: "Council Resilience Concerns Over River Flooding" (more descriptive)
57
+ - More coherent main summaries
58
+ - Higher quality, detailed key points
59
+ - Better sentence structure
60
+ - More accurate categorization
61
+ - **Trade-off**: 1.7x slower but significantly better content quality
62
+
63
+ **Recommendation**: Use 3B model for production (quality worth the speed trade-off)
64
+
65
+ ---
66
+
67
+ ### 3. **Performance Characteristics**
68
+
69
+ #### Speed Factors
70
+ 1. **Content Complexity**: Policy/political articles slower than tech articles
71
+ - Gisborne water article: 46s (4,161 chars)
72
+ - Victoria Uni article: 33s (5,542 chars) - despite being longer!
73
+ - M4 chip article: 17-22s (734 chars)
74
+
75
+ 2. **Model Size Impact**:
76
+ - 1.5B: 20-46s range
77
+ - 3B: 40-60s range (expected ~75s with Outlines JSON)
78
+
79
+ 3. **Caching**: Scraped articles cached for 1 hour
80
+ - Cache hit: 0ms scraping time
81
+ - Cache miss: 200-500ms scraping time
82
+
83
+ 4. **GPU State**: Thermal throttling and background processes affect speed
84
+
85
+ #### Generation Speed Patterns
86
+ - **Cold start**: Slower first request
87
+ - **Warmed up**: Faster subsequent requests
88
+ - **Content-dependent**: Complex topics require more "thinking"
89
+
90
+ ---
91
+
92
+ ### 4. **Technical Implementation Learnings**
93
+
94
+ #### SSE Format Handling
95
+ - NDJSON endpoint uses Server-Sent Events (SSE) format
96
+ - Lines start with `data: ` prefix
97
+ - Must strip prefix before parsing JSON
98
+ - Example: `data: {"op": "set", "field": "title", "value": "..."}`
99
+
100
+ #### NDJSON Patch Format
101
+ - Uses JSON Patch operations:
102
+ - `{"op": "set", "field": "title", "value": "..."}`
103
+ - `{"op": "append", "field": "key_points", "value": "..."}`
104
+ - `{"op": "done"}` signals completion
105
+ - Note: Server uses `"field"` not `"path"` in patches
106
+
107
+ #### Outlines JSON Cleaning
108
+ - Outlines library sometimes generates malformed JSON
109
+ - Automatic cleanup removes garbage characters (16-133 chars)
110
+ - Pattern: `#RR!R#!R#!###!!#` or similar
111
+ - Cleanup is reliable and preserves valid JSON structure
112
+
113
+ ---
114
+
115
+ ### 5. **Web Scraping Performance**
116
+
117
+ #### V3 Scraping Service
118
+ - **Speed**: 200-500ms typical (294-441ms in tests)
119
+ - **Cache hit**: <10ms (instant)
120
+ - **Success rate**: 95%+ article extraction
121
+ - **Method**: trafilatura (static scraping, no JavaScript)
122
+ - **Metadata**: Extracts title, author, date, site_name
123
+
124
+ #### Article Quality
125
+ - Minimum content: 100 characters required
126
+ - Maximum: 50,000 characters
127
+ - Validation: Sentence structure checks
128
+ - User-agent rotation: Enabled to avoid anti-scraping
129
+
130
+ ---
131
+
132
+ ### 6. **Production Recommendations**
133
+
134
+ #### For Android App
135
+ 1. **Primary Endpoint**: `/api/v4/scrape-and-summarize/stream-ndjson`
136
+ - Progressive updates for better UX
137
+ - Faster overall completion
138
+ - Real-time UI updates
139
+
140
+ 2. **Model**: Qwen 2.5-3B-Instruct
141
+ - Better quality summaries
142
+ - Acceptable speed (40-60s)
143
+ - Fits in 24GB M4 MacBook Pro memory
144
+
145
+ 3. **Fallback**: `/api/v4/scrape-and-summarize/stream-json`
146
+ - Use if NDJSON streaming fails
147
+ - More reliable but slower
148
+ - Single final JSON response
149
+
150
+ #### Performance Expectations
151
+ | Endpoint | Model | Expected Time | Quality |
152
+ |----------|-------|---------------|---------|
153
+ | NDJSON | 1.5B | 26s | ⭐⭐ |
154
+ | NDJSON | 3B | ~45s | ⭐⭐⭐⭐ |
155
+ | Outlines JSON | 1.5B | 46s | ⭐⭐ |
156
+ | Outlines JSON | 3B | ~75s | ⭐⭐⭐⭐ |
157
+
158
+ ---
159
+
160
+ ### 7. **Issues Encountered & Solutions**
161
+
162
+ #### Issue 1: NDJSON Streaming Not Working with 3B Model
163
+ - **Symptom**: Server generates content but client receives empty response
164
+ - **Root Cause**: SSE parsing issue in test scripts
165
+ - **Solution**: Properly handle `data: ` prefix in SSE format
166
+ - **Status**: Partially resolved (needs further investigation)
167
+
168
+ #### Issue 2: Outlines Garbage Characters
169
+ - **Symptom**: Malformed JSON with extra characters
170
+ - **Root Cause**: Outlines library constraint enforcement quirks
171
+ - **Solution**: Automatic JSON cleaning (already implemented)
172
+ - **Status**: βœ… Resolved
173
+
174
+ #### Issue 3: Token Limit Hit
175
+ - **Symptom**: Incomplete summaries (124/256 tokens)
176
+ - **Root Cause**: `max_tokens=256` too low for complex articles
177
+ - **Solution**: Increase `max_tokens` to 512 for better completeness
178
+ - **Status**: ⚠️ Needs configuration update
179
+
180
+ ---
181
+
182
+ ### 8. **Configuration Insights**
183
+
184
+ #### Optimal Settings for 3B Model
185
+ ```env
186
+ V4_MODEL_ID=Qwen/Qwen2.5-3B-Instruct
187
+ V4_MAX_TOKENS=512 # Increased from 256
188
+ V4_TEMPERATURE=0.2
189
+ ENABLE_V4_WARMUP=true
190
+ ```
191
+
192
+ #### Model Download
193
+ - 3B model: ~6GB download (2 shards)
194
+ - Download time: ~56 seconds
195
+ - Load time: ~2 seconds
196
+ - Total startup: ~60 seconds (first time)
197
+
198
+ ---
199
+
200
+ ### 9. **Testing Methodology**
201
+
202
+ #### Test Scripts Created
203
+ 1. `compare_endpoints.py` - Compare NDJSON vs Outlines JSON
204
+ 2. `show_both_outputs.py` - Side-by-side output comparison
205
+ 3. `test_v4_url.py` - URL scraping + summarization test
206
+ 4. `test_3b_model.py` - 3B model testing script
207
+
208
+ #### Test Articles Used
209
+ - NZ Herald: Victoria University email controversy (5,542 chars)
210
+ - NZ Herald: Gisborne water supply threat (4,161 chars)
211
+ - M4 chip article (734 chars)
212
+
213
+ ---
214
+
215
+ ### 10. **Key Takeaways**
216
+
217
+ βœ… **NDJSON is faster** (43% improvement) and provides better UX
218
+ βœ… **3B model quality** significantly better than 1.5B
219
+ βœ… **Outlines JSON** more reliable but slower
220
+ βœ… **Web scraping** fast and reliable (200-500ms)
221
+ βœ… **Caching** provides instant retrieval for repeated URLs
222
+ ⚠️ **NDJSON streaming** needs debugging for 3B model
223
+ ⚠️ **Token limits** should be increased to 512 for completeness
224
+
225
+ ---
226
+
227
+ ## 🎯 Final Recommendation
228
+
229
+ **For Production Android App:**
230
+ - **Endpoint**: `/api/v4/scrape-and-summarize/stream-ndjson`
231
+ - **Model**: Qwen 2.5-3B-Instruct
232
+ - **Max Tokens**: 512 (instead of 256)
233
+ - **Expected Performance**: ~45 seconds with progressive updates
234
+ - **Quality**: ⭐⭐⭐⭐ (much better than 1.5B)
235
+
236
+ **Fallback Option:**
237
+ - **Endpoint**: `/api/v4/scrape-and-summarize/stream-json`
238
+ - **Model**: Qwen 2.5-3B-Instruct
239
+ - **Expected Performance**: ~75 seconds (slower but more reliable)
240
+
241
+ ---
242
+
243
+ ## πŸ“Š Performance Summary Table
244
+
245
+ | Metric | 1.5B + NDJSON | 1.5B + Outlines | 3B + NDJSON | 3B + Outlines |
246
+ |--------|---------------|-----------------|-------------|---------------|
247
+ | Speed | 26s | 46s | ~45s | ~75s |
248
+ | Quality | ⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
249
+ | UX | βœ… Progressive | ❌ All-or-nothing | βœ… Progressive | ❌ All-or-nothing |
250
+ | Reliability | ⚠️ Issues | βœ… Reliable | ⚠️ Issues | βœ… Reliable |
251
+
252
+ **Best Overall**: 3B + NDJSON (once streaming issues resolved)
253
+ **Most Reliable**: 3B + Outlines JSON (slower but works)
254
+
app/api/v1/schemas.py CHANGED
@@ -2,8 +2,6 @@
2
  Pydantic schemas for API request/response models.
3
  """
4
 
5
- from typing import Optional
6
-
7
  from pydantic import BaseModel, Field, validator
8
 
9
 
@@ -13,16 +11,16 @@ class SummarizeRequest(BaseModel):
13
  text: str = Field(
14
  ..., min_length=1, max_length=32000, description="Text to summarize"
15
  )
16
- max_tokens: Optional[int] = Field(
17
  default=256, ge=1, le=2048, description="Maximum tokens for summary"
18
  )
19
- temperature: Optional[float] = Field(
20
  default=0.3, ge=0.0, le=2.0, description="Sampling temperature for generation"
21
  )
22
- top_p: Optional[float] = Field(
23
  default=0.9, ge=0.0, le=1.0, description="Nucleus sampling parameter"
24
  )
25
- prompt: Optional[str] = Field(
26
  default="Summarize the key points concisely:",
27
  max_length=500,
28
  description="Custom prompt for summarization",
@@ -41,8 +39,8 @@ class SummarizeResponse(BaseModel):
41
 
42
  summary: str = Field(..., description="Generated summary")
43
  model: str = Field(..., description="Model used for summarization")
44
- tokens_used: Optional[int] = Field(None, description="Number of tokens used")
45
- latency_ms: Optional[float] = Field(
46
  None, description="Processing time in milliseconds"
47
  )
48
 
@@ -53,7 +51,7 @@ class HealthResponse(BaseModel):
53
  status: str = Field(..., description="Service status")
54
  service: str = Field(..., description="Service name")
55
  version: str = Field(..., description="Service version")
56
- ollama: Optional[str] = Field(None, description="Ollama service status")
57
 
58
 
59
  class StreamChunk(BaseModel):
@@ -61,12 +59,12 @@ class StreamChunk(BaseModel):
61
 
62
  content: str = Field(..., description="Content chunk from the stream")
63
  done: bool = Field(..., description="Whether this is the final chunk")
64
- tokens_used: Optional[int] = Field(None, description="Number of tokens used so far")
65
 
66
 
67
  class ErrorResponse(BaseModel):
68
  """Error response schema."""
69
 
70
  detail: str = Field(..., description="Error message")
71
- code: Optional[str] = Field(None, description="Error code")
72
- request_id: Optional[str] = Field(None, description="Request ID for tracking")
 
2
  Pydantic schemas for API request/response models.
3
  """
4
 
 
 
5
  from pydantic import BaseModel, Field, validator
6
 
7
 
 
11
  text: str = Field(
12
  ..., min_length=1, max_length=32000, description="Text to summarize"
13
  )
14
+ max_tokens: int | None = Field(
15
  default=256, ge=1, le=2048, description="Maximum tokens for summary"
16
  )
17
+ temperature: float | None = Field(
18
  default=0.3, ge=0.0, le=2.0, description="Sampling temperature for generation"
19
  )
20
+ top_p: float | None = Field(
21
  default=0.9, ge=0.0, le=1.0, description="Nucleus sampling parameter"
22
  )
23
+ prompt: str | None = Field(
24
  default="Summarize the key points concisely:",
25
  max_length=500,
26
  description="Custom prompt for summarization",
 
39
 
40
  summary: str = Field(..., description="Generated summary")
41
  model: str = Field(..., description="Model used for summarization")
42
+ tokens_used: int | None = Field(None, description="Number of tokens used")
43
+ latency_ms: float | None = Field(
44
  None, description="Processing time in milliseconds"
45
  )
46
 
 
51
  status: str = Field(..., description="Service status")
52
  service: str = Field(..., description="Service name")
53
  version: str = Field(..., description="Service version")
54
+ ollama: str | None = Field(None, description="Ollama service status")
55
 
56
 
57
  class StreamChunk(BaseModel):
 
59
 
60
  content: str = Field(..., description="Content chunk from the stream")
61
  done: bool = Field(..., description="Whether this is the final chunk")
62
+ tokens_used: int | None = Field(None, description="Number of tokens used so far")
63
 
64
 
65
  class ErrorResponse(BaseModel):
66
  """Error response schema."""
67
 
68
  detail: str = Field(..., description="Error message")
69
+ code: str | None = Field(None, description="Error code")
70
+ request_id: str | None = Field(None, description="Request ID for tracking")
app/api/v1/summarize.py CHANGED
@@ -25,7 +25,7 @@ async def summarize(payload: SummarizeRequest) -> SummarizeResponse:
25
  prompt=payload.prompt or "Summarize the following text concisely:",
26
  )
27
  return SummarizeResponse(**result)
28
- except httpx.TimeoutException as e:
29
  # Timeout error - provide helpful message
30
  raise HTTPException(
31
  status_code=504,
@@ -51,7 +51,7 @@ async def _stream_generator(payload: SummarizeRequest):
51
  sse_data = json.dumps(chunk)
52
  yield f"data: {sse_data}\n\n"
53
 
54
- except httpx.TimeoutException as e:
55
  # Send error event in SSE format
56
  error_chunk = {
57
  "content": "",
 
25
  prompt=payload.prompt or "Summarize the following text concisely:",
26
  )
27
  return SummarizeResponse(**result)
28
+ except httpx.TimeoutException:
29
  # Timeout error - provide helpful message
30
  raise HTTPException(
31
  status_code=504,
 
51
  sse_data = json.dumps(chunk)
52
  yield f"data: {sse_data}\n\n"
53
 
54
+ except httpx.TimeoutException:
55
  # Send error event in SSE format
56
  error_chunk = {
57
  "content": "",
app/api/v2/schemas.py CHANGED
@@ -3,8 +3,13 @@ V2 API schemas - reuses V1 schemas for compatibility.
3
  """
4
 
5
  # Import all schemas from V1 to maintain API compatibility
6
- from app.api.v1.schemas import (ErrorResponse, HealthResponse, StreamChunk,
7
- SummarizeRequest, SummarizeResponse)
 
 
 
 
 
8
 
9
  # Re-export for V2 API
10
  __all__ = [
 
3
  """
4
 
5
  # Import all schemas from V1 to maintain API compatibility
6
+ from app.api.v1.schemas import (
7
+ ErrorResponse,
8
+ HealthResponse,
9
+ StreamChunk,
10
+ SummarizeRequest,
11
+ SummarizeResponse,
12
+ )
13
 
14
  # Re-export for V2 API
15
  __all__ = [
app/api/v2/summarize.py CHANGED
@@ -4,7 +4,7 @@ V2 Summarization endpoints using HuggingFace streaming.
4
 
5
  import json
6
 
7
- from fastapi import APIRouter, HTTPException
8
  from fastapi.responses import StreamingResponse
9
 
10
  from app.api.v2.schemas import SummarizeRequest
 
4
 
5
  import json
6
 
7
+ from fastapi import APIRouter
8
  from fastapi.responses import StreamingResponse
9
 
10
  from app.api.v2.schemas import SummarizeRequest
app/api/v3/schemas.py CHANGED
@@ -3,7 +3,6 @@ Request and response schemas for V3 API.
3
  """
4
 
5
  import re
6
- from typing import Optional
7
 
8
  from pydantic import BaseModel, Field, field_validator, model_validator
9
 
@@ -11,36 +10,36 @@ from pydantic import BaseModel, Field, field_validator, model_validator
11
  class ScrapeAndSummarizeRequest(BaseModel):
12
  """Request schema supporting both URL scraping and direct text summarization."""
13
 
14
- url: Optional[str] = Field(
15
  None,
16
  description="URL of article to scrape and summarize",
17
  example="https://example.com/article",
18
  )
19
- text: Optional[str] = Field(
20
  None,
21
  description="Direct text to summarize (alternative to URL)",
22
  example="Your article text here...",
23
  )
24
- max_tokens: Optional[int] = Field(
25
  default=256, ge=1, le=2048, description="Maximum tokens in summary"
26
  )
27
- temperature: Optional[float] = Field(
28
  default=0.3,
29
  ge=0.0,
30
  le=2.0,
31
  description="Sampling temperature (lower = more focused)",
32
  )
33
- top_p: Optional[float] = Field(
34
  default=0.9, ge=0.0, le=1.0, description="Nucleus sampling parameter"
35
  )
36
- prompt: Optional[str] = Field(
37
  default="Summarize this article concisely:",
38
  description="Custom summarization prompt",
39
  )
40
- include_metadata: Optional[bool] = Field(
41
  default=True, description="Include article metadata in response"
42
  )
43
- use_cache: Optional[bool] = Field(
44
  default=True, description="Use cached content if available (URL mode only)"
45
  )
46
 
@@ -55,7 +54,7 @@ class ScrapeAndSummarizeRequest(BaseModel):
55
 
56
  @field_validator("url")
57
  @classmethod
58
- def validate_url(cls, v: Optional[str]) -> Optional[str]:
59
  """Validate URL format and security."""
60
  if v is None:
61
  return v
@@ -83,29 +82,28 @@ class ScrapeAndSummarizeRequest(BaseModel):
83
 
84
  parsed = urlparse(v)
85
  hostname = parsed.hostname
86
- if hostname:
87
- # Check for private IP ranges
88
- if (
89
- hostname.startswith("10.")
90
- or hostname.startswith("192.168.")
91
- or hostname.startswith("172.16.")
92
- or hostname.startswith("172.17.")
93
- or hostname.startswith("172.18.")
94
- or hostname.startswith("172.19.")
95
- or hostname.startswith("172.20.")
96
- or hostname.startswith("172.21.")
97
- or hostname.startswith("172.22.")
98
- or hostname.startswith("172.23.")
99
- or hostname.startswith("172.24.")
100
- or hostname.startswith("172.25.")
101
- or hostname.startswith("172.26.")
102
- or hostname.startswith("172.27.")
103
- or hostname.startswith("172.28.")
104
- or hostname.startswith("172.29.")
105
- or hostname.startswith("172.30.")
106
- or hostname.startswith("172.31.")
107
- ):
108
- raise ValueError("Cannot scrape private IP addresses")
109
 
110
  # Block file:// and other dangerous schemes
111
  if not v.startswith(("http://", "https://")):
@@ -119,7 +117,7 @@ class ScrapeAndSummarizeRequest(BaseModel):
119
 
120
  @field_validator("text")
121
  @classmethod
122
- def validate_text(cls, v: Optional[str]) -> Optional[str]:
123
  """Validate text content if provided."""
124
  if v is None:
125
  return v
@@ -141,10 +139,10 @@ class ScrapeAndSummarizeRequest(BaseModel):
141
  class ArticleMetadata(BaseModel):
142
  """Article metadata extracted during scraping."""
143
 
144
- title: Optional[str] = Field(None, description="Article title")
145
- author: Optional[str] = Field(None, description="Author name")
146
- date_published: Optional[str] = Field(None, description="Publication date")
147
- site_name: Optional[str] = Field(None, description="Website name")
148
  url: str = Field(..., description="Original URL")
149
  extracted_text_length: int = Field(..., description="Length of extracted text")
150
  scrape_method: str = Field(..., description="Scraping method used")
@@ -156,4 +154,4 @@ class ErrorResponse(BaseModel):
156
 
157
  detail: str = Field(..., description="Error message")
158
  code: str = Field(..., description="Error code")
159
- request_id: Optional[str] = Field(None, description="Request tracking ID")
 
3
  """
4
 
5
  import re
 
6
 
7
  from pydantic import BaseModel, Field, field_validator, model_validator
8
 
 
10
  class ScrapeAndSummarizeRequest(BaseModel):
11
  """Request schema supporting both URL scraping and direct text summarization."""
12
 
13
+ url: str | None = Field(
14
  None,
15
  description="URL of article to scrape and summarize",
16
  example="https://example.com/article",
17
  )
18
+ text: str | None = Field(
19
  None,
20
  description="Direct text to summarize (alternative to URL)",
21
  example="Your article text here...",
22
  )
23
+ max_tokens: int | None = Field(
24
  default=256, ge=1, le=2048, description="Maximum tokens in summary"
25
  )
26
+ temperature: float | None = Field(
27
  default=0.3,
28
  ge=0.0,
29
  le=2.0,
30
  description="Sampling temperature (lower = more focused)",
31
  )
32
+ top_p: float | None = Field(
33
  default=0.9, ge=0.0, le=1.0, description="Nucleus sampling parameter"
34
  )
35
+ prompt: str | None = Field(
36
  default="Summarize this article concisely:",
37
  description="Custom summarization prompt",
38
  )
39
+ include_metadata: bool | None = Field(
40
  default=True, description="Include article metadata in response"
41
  )
42
+ use_cache: bool | None = Field(
43
  default=True, description="Use cached content if available (URL mode only)"
44
  )
45
 
 
54
 
55
  @field_validator("url")
56
  @classmethod
57
+ def validate_url(cls, v: str | None) -> str | None:
58
  """Validate URL format and security."""
59
  if v is None:
60
  return v
 
82
 
83
  parsed = urlparse(v)
84
  hostname = parsed.hostname
85
+ # Check for private IP ranges
86
+ if hostname and (
87
+ hostname.startswith("10.")
88
+ or hostname.startswith("192.168.")
89
+ or hostname.startswith("172.16.")
90
+ or hostname.startswith("172.17.")
91
+ or hostname.startswith("172.18.")
92
+ or hostname.startswith("172.19.")
93
+ or hostname.startswith("172.20.")
94
+ or hostname.startswith("172.21.")
95
+ or hostname.startswith("172.22.")
96
+ or hostname.startswith("172.23.")
97
+ or hostname.startswith("172.24.")
98
+ or hostname.startswith("172.25.")
99
+ or hostname.startswith("172.26.")
100
+ or hostname.startswith("172.27.")
101
+ or hostname.startswith("172.28.")
102
+ or hostname.startswith("172.29.")
103
+ or hostname.startswith("172.30.")
104
+ or hostname.startswith("172.31.")
105
+ ):
106
+ raise ValueError("Cannot scrape private IP addresses")
 
107
 
108
  # Block file:// and other dangerous schemes
109
  if not v.startswith(("http://", "https://")):
 
117
 
118
  @field_validator("text")
119
  @classmethod
120
+ def validate_text(cls, v: str | None) -> str | None:
121
  """Validate text content if provided."""
122
  if v is None:
123
  return v
 
139
  class ArticleMetadata(BaseModel):
140
  """Article metadata extracted during scraping."""
141
 
142
+ title: str | None = Field(None, description="Article title")
143
+ author: str | None = Field(None, description="Author name")
144
+ date_published: str | None = Field(None, description="Publication date")
145
+ site_name: str | None = Field(None, description="Website name")
146
  url: str = Field(..., description="Original URL")
147
  extracted_text_length: int = Field(..., description="Length of extracted text")
148
  scrape_method: str = Field(..., description="Scraping method used")
 
154
 
155
  detail: str = Field(..., description="Error message")
156
  code: str = Field(..., description="Error code")
157
+ request_id: str | None = Field(None, description="Request tracking ID")
app/api/v4/schemas.py CHANGED
@@ -4,7 +4,6 @@ Request and response schemas for V4 structured summarization API.
4
 
5
  import re
6
  from enum import Enum
7
- from typing import List, Optional
8
 
9
  from pydantic import BaseModel, Field, field_validator, model_validator
10
 
@@ -28,12 +27,12 @@ class Sentiment(str, Enum):
28
  class StructuredSummaryRequest(BaseModel):
29
  """Request schema for V4 structured summarization."""
30
 
31
- url: Optional[str] = Field(
32
  None,
33
  description="URL of article to scrape and summarize",
34
  example="https://example.com/article",
35
  )
36
- text: Optional[str] = Field(
37
  None,
38
  description="Direct text to summarize (alternative to URL)",
39
  example="Your article text here...",
@@ -42,13 +41,13 @@ class StructuredSummaryRequest(BaseModel):
42
  default=SummarizationStyle.EXECUTIVE,
43
  description="Summarization style to apply",
44
  )
45
- max_tokens: Optional[int] = Field(
46
  default=1024, ge=128, le=2048, description="Maximum tokens to generate"
47
  )
48
- include_metadata: Optional[bool] = Field(
49
  default=True, description="Include scraping metadata in first SSE event"
50
  )
51
- use_cache: Optional[bool] = Field(
52
  default=True, description="Use cached content if available (URL mode only)"
53
  )
54
 
@@ -63,7 +62,7 @@ class StructuredSummaryRequest(BaseModel):
63
 
64
  @field_validator("url")
65
  @classmethod
66
- def validate_url(cls, v: Optional[str]) -> Optional[str]:
67
  """Validate URL format and security."""
68
  if v is None:
69
  return v
@@ -91,29 +90,28 @@ class StructuredSummaryRequest(BaseModel):
91
 
92
  parsed = urlparse(v)
93
  hostname = parsed.hostname
94
- if hostname:
95
- # Check for private IP ranges
96
- if (
97
- hostname.startswith("10.")
98
- or hostname.startswith("192.168.")
99
- or hostname.startswith("172.16.")
100
- or hostname.startswith("172.17.")
101
- or hostname.startswith("172.18.")
102
- or hostname.startswith("172.19.")
103
- or hostname.startswith("172.20.")
104
- or hostname.startswith("172.21.")
105
- or hostname.startswith("172.22.")
106
- or hostname.startswith("172.23.")
107
- or hostname.startswith("172.24.")
108
- or hostname.startswith("172.25.")
109
- or hostname.startswith("172.26.")
110
- or hostname.startswith("172.27.")
111
- or hostname.startswith("172.28.")
112
- or hostname.startswith("172.29.")
113
- or hostname.startswith("172.30.")
114
- or hostname.startswith("172.31.")
115
- ):
116
- raise ValueError("Cannot scrape private IP addresses")
117
 
118
  # Block file:// and other dangerous schemes
119
  if not v.startswith(("http://", "https://")):
@@ -127,7 +125,7 @@ class StructuredSummaryRequest(BaseModel):
127
 
128
  @field_validator("text")
129
  @classmethod
130
- def validate_text(cls, v: Optional[str]) -> Optional[str]:
131
  """Validate text content if provided."""
132
  if v is None:
133
  return v
@@ -151,7 +149,11 @@ class StructuredSummary(BaseModel):
151
 
152
  title: str = Field(..., description="A click-worthy, engaging title")
153
  main_summary: str = Field(..., description="The main summary content")
154
- key_points: List[str] = Field(..., description="List of 3-5 distinct key facts")
155
- category: str = Field(..., description="Topic category (e.g., Tech, Politics, Health)")
 
 
156
  sentiment: Sentiment = Field(..., description="Overall sentiment of the article")
157
- read_time_min: int = Field(..., description="Estimated minutes to read the original article", ge=1)
 
 
 
4
 
5
  import re
6
  from enum import Enum
 
7
 
8
  from pydantic import BaseModel, Field, field_validator, model_validator
9
 
 
27
  class StructuredSummaryRequest(BaseModel):
28
  """Request schema for V4 structured summarization."""
29
 
30
+ url: str | None = Field(
31
  None,
32
  description="URL of article to scrape and summarize",
33
  example="https://example.com/article",
34
  )
35
+ text: str | None = Field(
36
  None,
37
  description="Direct text to summarize (alternative to URL)",
38
  example="Your article text here...",
 
41
  default=SummarizationStyle.EXECUTIVE,
42
  description="Summarization style to apply",
43
  )
44
+ max_tokens: int | None = Field(
45
  default=1024, ge=128, le=2048, description="Maximum tokens to generate"
46
  )
47
+ include_metadata: bool | None = Field(
48
  default=True, description="Include scraping metadata in first SSE event"
49
  )
50
+ use_cache: bool | None = Field(
51
  default=True, description="Use cached content if available (URL mode only)"
52
  )
53
 
 
62
 
63
  @field_validator("url")
64
  @classmethod
65
+ def validate_url(cls, v: str | None) -> str | None:
66
  """Validate URL format and security."""
67
  if v is None:
68
  return v
 
90
 
91
  parsed = urlparse(v)
92
  hostname = parsed.hostname
93
+ # Check for private IP ranges
94
+ if hostname and (
95
+ hostname.startswith("10.")
96
+ or hostname.startswith("192.168.")
97
+ or hostname.startswith("172.16.")
98
+ or hostname.startswith("172.17.")
99
+ or hostname.startswith("172.18.")
100
+ or hostname.startswith("172.19.")
101
+ or hostname.startswith("172.20.")
102
+ or hostname.startswith("172.21.")
103
+ or hostname.startswith("172.22.")
104
+ or hostname.startswith("172.23.")
105
+ or hostname.startswith("172.24.")
106
+ or hostname.startswith("172.25.")
107
+ or hostname.startswith("172.26.")
108
+ or hostname.startswith("172.27.")
109
+ or hostname.startswith("172.28.")
110
+ or hostname.startswith("172.29.")
111
+ or hostname.startswith("172.30.")
112
+ or hostname.startswith("172.31.")
113
+ ):
114
+ raise ValueError("Cannot scrape private IP addresses")
 
115
 
116
  # Block file:// and other dangerous schemes
117
  if not v.startswith(("http://", "https://")):
 
125
 
126
  @field_validator("text")
127
  @classmethod
128
+ def validate_text(cls, v: str | None) -> str | None:
129
  """Validate text content if provided."""
130
  if v is None:
131
  return v
 
149
 
150
  title: str = Field(..., description="A click-worthy, engaging title")
151
  main_summary: str = Field(..., description="The main summary content")
152
+ key_points: list[str] = Field(..., description="List of 3-5 distinct key facts")
153
+ category: str = Field(
154
+ ..., description="Topic category (e.g., Tech, Politics, Health)"
155
+ )
156
  sentiment: Sentiment = Field(..., description="Overall sentiment of the article")
157
+ read_time_min: int = Field(
158
+ ..., description="Estimated minutes to read the original article", ge=1
159
+ )
app/api/v4/structured_summary.py CHANGED
@@ -267,7 +267,9 @@ async def _stream_generator_ndjson(text: str, payload, metadata: dict, request_i
267
  summarization_start = time.time()
268
 
269
  try:
270
- async for event in structured_summarizer_service.summarize_structured_stream_ndjson(
 
 
271
  text=text,
272
  style=payload.style.value,
273
  max_tokens=payload.max_tokens,
@@ -374,7 +376,9 @@ async def scrape_and_summarize_stream_json(
374
 
375
  # Now stream the JSON tokens from the service
376
  try:
377
- async for token in structured_summarizer_service.summarize_structured_stream_json(
 
 
378
  text=text_to_summarize,
379
  style=payload.style.value,
380
  ):
 
267
  summarization_start = time.time()
268
 
269
  try:
270
+ async for (
271
+ event
272
+ ) in structured_summarizer_service.summarize_structured_stream_ndjson(
273
  text=text,
274
  style=payload.style.value,
275
  max_tokens=payload.max_tokens,
 
376
 
377
  # Now stream the JSON tokens from the service
378
  try:
379
+ async for (
380
+ token
381
+ ) in structured_summarizer_service.summarize_structured_stream_json(
382
  text=text_to_summarize,
383
  style=payload.style.value,
384
  ):
app/core/cache.py CHANGED
@@ -4,7 +4,7 @@ Simple in-memory cache with TTL for V3 web scraping API.
4
 
5
  import time
6
  from threading import Lock
7
- from typing import Any, Dict, Optional
8
 
9
  from app.core.logging import get_logger
10
 
@@ -22,7 +22,7 @@ class SimpleCache:
22
  ttl_seconds: Time-to-live for cache entries in seconds (default: 1 hour)
23
  max_size: Maximum number of entries to store (default: 1000)
24
  """
25
- self._cache: Dict[str, Dict[str, Any]] = {}
26
  self._lock = Lock()
27
  self._ttl = ttl_seconds
28
  self._max_size = max_size
@@ -30,7 +30,7 @@ class SimpleCache:
30
  self._misses = 0
31
  logger.info(f"Cache initialized with TTL={ttl_seconds}s, max_size={max_size}")
32
 
33
- def get(self, key: str) -> Optional[Dict[str, Any]]:
34
  """
35
  Get cached content for key.
36
 
@@ -59,7 +59,7 @@ class SimpleCache:
59
  logger.debug(f"Cache hit for key: {key[:50]}...")
60
  return entry["data"]
61
 
62
- def set(self, key: str, data: Dict[str, Any]) -> None:
63
  """
64
  Cache content with TTL.
65
 
@@ -116,7 +116,7 @@ class SimpleCache:
116
  self._misses = 0
117
  logger.info(f"Cleared all {count} cache entries")
118
 
119
- def stats(self) -> Dict[str, int]:
120
  """
121
  Get cache statistics.
122
 
 
4
 
5
  import time
6
  from threading import Lock
7
+ from typing import Any
8
 
9
  from app.core.logging import get_logger
10
 
 
22
  ttl_seconds: Time-to-live for cache entries in seconds (default: 1 hour)
23
  max_size: Maximum number of entries to store (default: 1000)
24
  """
25
+ self._cache: dict[str, dict[str, Any]] = {}
26
  self._lock = Lock()
27
  self._ttl = ttl_seconds
28
  self._max_size = max_size
 
30
  self._misses = 0
31
  logger.info(f"Cache initialized with TTL={ttl_seconds}s, max_size={max_size}")
32
 
33
+ def get(self, key: str) -> dict[str, Any] | None:
34
  """
35
  Get cached content for key.
36
 
 
59
  logger.debug(f"Cache hit for key: {key[:50]}...")
60
  return entry["data"]
61
 
62
+ def set(self, key: str, data: dict[str, Any]) -> None:
63
  """
64
  Cache content with TTL.
65
 
 
116
  self._misses = 0
117
  logger.info(f"Cleared all {count} cache entries")
118
 
119
+ def stats(self) -> dict[str, int]:
120
  """
121
  Get cache statistics.
122
 
app/core/config.py CHANGED
@@ -2,9 +2,6 @@
2
  Configuration management for the text summarizer backend.
3
  """
4
 
5
- import os
6
- from typing import Optional
7
-
8
  from pydantic import Field, validator
9
  from pydantic_settings import BaseSettings
10
 
@@ -24,7 +21,7 @@ class Settings(BaseSettings):
24
 
25
  # Optional: API Security
26
  api_key_enabled: bool = Field(default=False, env="API_KEY_ENABLED")
27
- api_key: Optional[str] = Field(default=None, env="API_KEY")
28
 
29
  # Optional: Rate Limiting
30
  rate_limit_enabled: bool = Field(default=False, env="RATE_LIMIT_ENABLED")
@@ -99,7 +96,9 @@ class Settings(BaseSettings):
99
 
100
  # V4 Structured Output Configuration
101
  enable_v4_structured: bool = Field(
102
- default=True, env="ENABLE_V4_STRUCTURED", description="Enable V4 structured summarization API"
 
 
103
  )
104
  enable_v4_warmup: bool = Field(
105
  default=False,
@@ -112,10 +111,18 @@ class Settings(BaseSettings):
112
  description="Model ID for V4 structured output (1.5B params, fits HF 16GB limit)",
113
  )
114
  v4_max_tokens: int = Field(
115
- default=256, env="V4_MAX_TOKENS", ge=128, le=2048, description="Max tokens for V4 generation"
 
 
 
 
116
  )
117
  v4_temperature: float = Field(
118
- default=0.2, env="V4_TEMPERATURE", ge=0.0, le=2.0, description="Temperature for V4 (low for stable JSON)"
 
 
 
 
119
  )
120
  v4_enable_quantization: bool = Field(
121
  default=True,
@@ -139,6 +146,7 @@ class Settings(BaseSettings):
139
  class Config:
140
  env_file = ".env"
141
  case_sensitive = False
 
142
 
143
 
144
  # Global settings instance
 
2
  Configuration management for the text summarizer backend.
3
  """
4
 
 
 
 
5
  from pydantic import Field, validator
6
  from pydantic_settings import BaseSettings
7
 
 
21
 
22
  # Optional: API Security
23
  api_key_enabled: bool = Field(default=False, env="API_KEY_ENABLED")
24
+ api_key: str | None = Field(default=None, env="API_KEY")
25
 
26
  # Optional: Rate Limiting
27
  rate_limit_enabled: bool = Field(default=False, env="RATE_LIMIT_ENABLED")
 
96
 
97
  # V4 Structured Output Configuration
98
  enable_v4_structured: bool = Field(
99
+ default=True,
100
+ env="ENABLE_V4_STRUCTURED",
101
+ description="Enable V4 structured summarization API",
102
  )
103
  enable_v4_warmup: bool = Field(
104
  default=False,
 
111
  description="Model ID for V4 structured output (1.5B params, fits HF 16GB limit)",
112
  )
113
  v4_max_tokens: int = Field(
114
+ default=256,
115
+ env="V4_MAX_TOKENS",
116
+ ge=128,
117
+ le=2048,
118
+ description="Max tokens for V4 generation",
119
  )
120
  v4_temperature: float = Field(
121
+ default=0.2,
122
+ env="V4_TEMPERATURE",
123
+ ge=0.0,
124
+ le=2.0,
125
+ description="Temperature for V4 (low for stable JSON)",
126
  )
127
  v4_enable_quantization: bool = Field(
128
  default=True,
 
146
  class Config:
147
  env_file = ".env"
148
  case_sensitive = False
149
+ extra = "ignore" # Ignore extra fields from environment (e.g., old v4_phi_* variables)
150
 
151
 
152
  # Global settings instance
app/core/logging.py CHANGED
@@ -4,7 +4,7 @@ Logging configuration for the text summarizer backend.
4
 
5
  import logging
6
  import sys
7
- from typing import Any, Dict
8
 
9
  from app.core.config import settings
10
 
 
4
 
5
  import logging
6
  import sys
7
+ from typing import Any
8
 
9
  from app.core.config import settings
10
 
app/core/middleware.py CHANGED
@@ -4,7 +4,7 @@ Custom middlewares for request ID and timing/logging.
4
 
5
  import time
6
  import uuid
7
- from typing import Callable
8
 
9
  from fastapi import Request, Response
10
 
 
4
 
5
  import time
6
  import uuid
7
+ from collections.abc import Callable
8
 
9
  from fastapi import Request, Response
10
 
app/services/article_scraper.py CHANGED
@@ -4,7 +4,7 @@ Article scraping service for V3 API using trafilatura.
4
 
5
  import random
6
  import time
7
- from typing import Any, Dict, Optional
8
  from urllib.parse import urlparse
9
 
10
  import httpx
@@ -34,8 +34,7 @@ USER_AGENTS = [
34
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
35
  "(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
36
  # Firefox on Windows
37
- "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) "
38
- "Gecko/20100101 Firefox/121.0",
39
  # Safari on macOS
40
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 "
41
  "(KHTML, like Gecko) Version/17.1 Safari/605.1.15",
@@ -55,7 +54,7 @@ class ArticleScraperService:
55
  else:
56
  logger.info("βœ… Article scraper service initialized")
57
 
58
- async def scrape_article(self, url: str, use_cache: bool = True) -> Dict[str, Any]:
59
  """
60
  Scrape article content from URL with caching support.
61
 
@@ -176,7 +175,7 @@ class ArticleScraperService:
176
  logger.error(f"Scraping failed for URL {url}: {e}")
177
  raise
178
 
179
- def _get_random_headers(self) -> Dict[str, str]:
180
  """
181
  Generate realistic browser headers with random user-agent.
182
 
@@ -249,7 +248,7 @@ class ArticleScraperService:
249
  except Exception:
250
  return "Unknown"
251
 
252
- def _extract_title_fallback(self, html: str) -> Optional[str]:
253
  """
254
  Fallback method to extract title from HTML if metadata extraction fails.
255
 
 
4
 
5
  import random
6
  import time
7
+ from typing import Any
8
  from urllib.parse import urlparse
9
 
10
  import httpx
 
34
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
35
  "(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
36
  # Firefox on Windows
37
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
 
38
  # Safari on macOS
39
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 "
40
  "(KHTML, like Gecko) Version/17.1 Safari/605.1.15",
 
54
  else:
55
  logger.info("βœ… Article scraper service initialized")
56
 
57
+ async def scrape_article(self, url: str, use_cache: bool = True) -> dict[str, Any]:
58
  """
59
  Scrape article content from URL with caching support.
60
 
 
175
  logger.error(f"Scraping failed for URL {url}: {e}")
176
  raise
177
 
178
+ def _get_random_headers(self) -> dict[str, str]:
179
  """
180
  Generate realistic browser headers with random user-agent.
181
 
 
248
  except Exception:
249
  return "Unknown"
250
 
251
+ def _extract_title_fallback(self, html: str) -> str | None:
252
  """
253
  Fallback method to extract title from HTML if metadata extraction fails.
254
 
app/services/hf_streaming_summarizer.py CHANGED
@@ -5,7 +5,8 @@ HuggingFace streaming service for V2 API using lower-level transformers API with
5
  import asyncio
6
  import threading
7
  import time
8
- from typing import Any, AsyncGenerator, Dict, Optional
 
9
 
10
  from app.core.config import settings
11
  from app.core.logging import get_logger
@@ -15,8 +16,7 @@ logger = get_logger(__name__)
15
  # Try to import transformers, but make it optional
16
  try:
17
  import torch
18
- from transformers import (AutoModelForSeq2SeqLM, AutoTokenizer,
19
- TextIteratorStreamer)
20
  from transformers.tokenization_utils_base import BatchEncoding
21
 
22
  TRANSFORMERS_AVAILABLE = True
@@ -58,8 +58,8 @@ class HFStreamingSummarizer:
58
 
59
  def __init__(self):
60
  """Initialize the HuggingFace model and tokenizer."""
61
- self.tokenizer: Optional[AutoTokenizer] = None
62
- self.model: Optional[AutoModelForSeq2SeqLM] = None
63
 
64
  if not TRANSFORMERS_AVAILABLE:
65
  logger.warning("⚠️ Transformers not available - V2 endpoints will not work")
@@ -171,7 +171,7 @@ class HFStreamingSummarizer:
171
  temperature: float = None,
172
  top_p: float = None,
173
  prompt: str = "Summarize the key points concisely:",
174
- ) -> AsyncGenerator[Dict[str, Any], None]:
175
  """
176
  Stream text summarization using HuggingFace's TextIteratorStreamer.
177
 
@@ -277,14 +277,14 @@ class HFStreamingSummarizer:
277
  inputs = {"input_ids": inputs_raw}
278
 
279
  # Ensure attention_mask only if missing AND input_ids is a Tensor
280
- if "attention_mask" not in inputs and "input_ids" in inputs:
281
- # Check if torch is available and input is a tensor
282
- if (
283
- TRANSFORMERS_AVAILABLE
284
- and "torch" in globals()
285
- and isinstance(inputs["input_ids"], torch.Tensor)
286
- ):
287
- inputs["attention_mask"] = torch.ones_like(inputs["input_ids"])
288
 
289
  # --- HARDEN: force singleton batch across all tensor fields ---
290
  def _to_singleton_batch(d):
@@ -333,8 +333,10 @@ class HFStreamingSummarizer:
333
  # Move inputs to model device (required even with device_map="auto")
334
  # For encoder-decoder models, inputs need to be on the encoder device
335
  model_device = next(self.model.parameters()).device
336
- inputs = {k: v.to(model_device) if isinstance(v, torch.Tensor) else v
337
- for k, v in inputs.items()}
 
 
338
 
339
  # Validate pad/eos ids
340
  pad_id = self.tokenizer.pad_token_id
@@ -452,7 +454,7 @@ class HFStreamingSummarizer:
452
  temperature: float,
453
  top_p: float,
454
  prompt: str,
455
- ) -> AsyncGenerator[Dict[str, Any], None]:
456
  """
457
  Recursively summarize long text by chunking and summarizing each chunk,
458
  then summarizing the summaries if there are multiple chunks.
@@ -468,7 +470,7 @@ class HFStreamingSummarizer:
468
 
469
  # Summarize each chunk
470
  for i, chunk in enumerate(chunks):
471
- logger.info(f"Summarizing chunk {i+1}/{len(chunks)}")
472
 
473
  # Use smaller max_new_tokens for individual chunks
474
  chunk_max_tokens = min(max_new_tokens, 80)
@@ -520,7 +522,7 @@ class HFStreamingSummarizer:
520
  temperature: float,
521
  top_p: float,
522
  prompt: str,
523
- ) -> AsyncGenerator[Dict[str, Any], None]:
524
  """
525
  Summarize a single chunk of text using the same logic as the main method
526
  but without the recursive check.
@@ -591,13 +593,14 @@ class HFStreamingSummarizer:
591
  else:
592
  inputs = {"input_ids": inputs_raw}
593
 
594
- if "attention_mask" not in inputs and "input_ids" in inputs:
595
- if (
596
- TRANSFORMERS_AVAILABLE
597
- and "torch" in globals()
598
- and isinstance(inputs["input_ids"], torch.Tensor)
599
- ):
600
- inputs["attention_mask"] = torch.ones_like(inputs["input_ids"])
 
601
 
602
  def _to_singleton_batch(d):
603
  out = {}
 
5
  import asyncio
6
  import threading
7
  import time
8
+ from collections.abc import AsyncGenerator
9
+ from typing import Any
10
 
11
  from app.core.config import settings
12
  from app.core.logging import get_logger
 
16
  # Try to import transformers, but make it optional
17
  try:
18
  import torch
19
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TextIteratorStreamer
 
20
  from transformers.tokenization_utils_base import BatchEncoding
21
 
22
  TRANSFORMERS_AVAILABLE = True
 
58
 
59
  def __init__(self):
60
  """Initialize the HuggingFace model and tokenizer."""
61
+ self.tokenizer: AutoTokenizer | None = None
62
+ self.model: AutoModelForSeq2SeqLM | None = None
63
 
64
  if not TRANSFORMERS_AVAILABLE:
65
  logger.warning("⚠️ Transformers not available - V2 endpoints will not work")
 
171
  temperature: float = None,
172
  top_p: float = None,
173
  prompt: str = "Summarize the key points concisely:",
174
+ ) -> AsyncGenerator[dict[str, Any], None]:
175
  """
176
  Stream text summarization using HuggingFace's TextIteratorStreamer.
177
 
 
277
  inputs = {"input_ids": inputs_raw}
278
 
279
  # Ensure attention_mask only if missing AND input_ids is a Tensor
280
+ if (
281
+ "attention_mask" not in inputs
282
+ and "input_ids" in inputs
283
+ and TRANSFORMERS_AVAILABLE
284
+ and "torch" in globals()
285
+ and isinstance(inputs["input_ids"], torch.Tensor)
286
+ ):
287
+ inputs["attention_mask"] = torch.ones_like(inputs["input_ids"])
288
 
289
  # --- HARDEN: force singleton batch across all tensor fields ---
290
  def _to_singleton_batch(d):
 
333
  # Move inputs to model device (required even with device_map="auto")
334
  # For encoder-decoder models, inputs need to be on the encoder device
335
  model_device = next(self.model.parameters()).device
336
+ inputs = {
337
+ k: v.to(model_device) if isinstance(v, torch.Tensor) else v
338
+ for k, v in inputs.items()
339
+ }
340
 
341
  # Validate pad/eos ids
342
  pad_id = self.tokenizer.pad_token_id
 
454
  temperature: float,
455
  top_p: float,
456
  prompt: str,
457
+ ) -> AsyncGenerator[dict[str, Any], None]:
458
  """
459
  Recursively summarize long text by chunking and summarizing each chunk,
460
  then summarizing the summaries if there are multiple chunks.
 
470
 
471
  # Summarize each chunk
472
  for i, chunk in enumerate(chunks):
473
+ logger.info(f"Summarizing chunk {i + 1}/{len(chunks)}")
474
 
475
  # Use smaller max_new_tokens for individual chunks
476
  chunk_max_tokens = min(max_new_tokens, 80)
 
522
  temperature: float,
523
  top_p: float,
524
  prompt: str,
525
+ ) -> AsyncGenerator[dict[str, Any], None]:
526
  """
527
  Summarize a single chunk of text using the same logic as the main method
528
  but without the recursive check.
 
593
  else:
594
  inputs = {"input_ids": inputs_raw}
595
 
596
+ if (
597
+ "attention_mask" not in inputs
598
+ and "input_ids" in inputs
599
+ and TRANSFORMERS_AVAILABLE
600
+ and "torch" in globals()
601
+ and isinstance(inputs["input_ids"], torch.Tensor)
602
+ ):
603
+ inputs["attention_mask"] = torch.ones_like(inputs["input_ids"])
604
 
605
  def _to_singleton_batch(d):
606
  out = {}
app/services/structured_summarizer.py CHANGED
@@ -6,7 +6,8 @@ import asyncio
6
  import json
7
  import threading
8
  import time
9
- from typing import Any, AsyncGenerator, Dict, Optional
 
10
 
11
  from app.core.config import settings
12
  from app.core.logging import get_logger
@@ -20,6 +21,7 @@ import os
20
 
21
  _original_getuser = getpass.getuser
22
 
 
23
  def _mock_getuser():
24
  """Mock getuser for HF Spaces compatibility."""
25
  try:
@@ -28,6 +30,7 @@ def _mock_getuser():
28
  # Fallback for containerized environments without proper user database
29
  return os.environ.get("USER", os.environ.get("USERNAME", "user"))
30
 
 
31
  getpass.getuser = _mock_getuser
32
 
33
  # Try to import transformers
@@ -58,38 +61,47 @@ outlines_generate = None
58
 
59
  try:
60
  import outlines
 
61
  # Check what's available in outlines module
62
- available_attrs = [attr for attr in dir(outlines) if not attr.startswith('_')]
63
  logger.info(f"Outlines module attributes: {available_attrs}")
64
-
65
  # Try to import models
66
  try:
67
  from outlines import models as outlines_models
68
  except ImportError:
69
  logger.warning("Could not import outlines.models")
70
  raise
71
-
72
  # Try to import generate module (for outlines.generate.json)
73
  try:
74
  from outlines import generate as outlines_generate
 
75
  logger.info("βœ… Found outlines.generate module")
76
  except ImportError as e:
77
  logger.warning(f"Could not import outlines.generate: {e}")
78
  outlines_generate = None
79
-
80
  if outlines_generate is None:
81
- raise ImportError(f"Could not import outlines.generate. Available in outlines: {available_attrs[:10]}...")
82
-
 
 
83
  OUTLINES_AVAILABLE = True
84
  logger.info("βœ… Outlines library imported successfully")
85
  except ImportError as e:
86
- logger.warning(f"Outlines library not available: {e}. V4 JSON streaming endpoints will be disabled.")
 
 
87
  except Exception as e:
88
- logger.warning(f"Error importing Outlines library: {e}. V4 JSON streaming endpoints will be disabled.")
 
 
89
 
90
 
91
  class StructuredSummary(BaseModel):
92
  """Pydantic schema for structured summary output."""
 
93
  title: str
94
  main_summary: str
95
  key_points: list[str]
@@ -103,8 +115,8 @@ class StructuredSummarizer:
103
 
104
  def __init__(self):
105
  """Initialize the Qwen model and tokenizer with GPU/INT4 when possible."""
106
- self.tokenizer: Optional[AutoTokenizer] = None
107
- self.model: Optional[AutoModelForCausalLM] = None
108
  self.outlines_model = None # Outlines wrapper over the HF model
109
 
110
  if not TRANSFORMERS_AVAILABLE:
@@ -135,14 +147,16 @@ class StructuredSummarizer:
135
  # OR FP16 for speed (2-3x faster, uses more memory)
136
  # ------------------------------------------------------------------
137
  use_fp16_for_speed = getattr(settings, "v4_use_fp16_for_speed", False)
138
-
139
  if (
140
  use_cuda
141
  and not use_fp16_for_speed
142
  and getattr(settings, "v4_enable_quantization", True)
143
  and HAS_BITSANDBYTES
144
  ):
145
- logger.info("Applying 4-bit NF4 quantization (bitsandbytes) to V4 model...")
 
 
146
  quant_config = BitsAndBytesConfig(
147
  load_in_4bit=True,
148
  bnb_4bit_compute_dtype=torch.bfloat16,
@@ -158,10 +172,12 @@ class StructuredSummarizer:
158
  trust_remote_code=True,
159
  )
160
  quantization_desc = "4-bit NF4 (bitsandbytes, GPU)"
161
-
162
  elif use_cuda and use_fp16_for_speed:
163
  # Use FP16 for 2-3x faster inference (uses ~2-3GB GPU memory)
164
- logger.info("Loading V4 model in FP16 for maximum speed (2-3x faster than 4-bit)...")
 
 
165
  self.model = AutoModelForCausalLM.from_pretrained(
166
  settings.v4_model_id,
167
  dtype=torch.float16,
@@ -194,7 +210,9 @@ class StructuredSummarizer:
194
  # Optional dynamic INT8 quantization on CPU
195
  if getattr(settings, "v4_enable_quantization", True) and not use_cuda:
196
  try:
197
- logger.info("Applying dynamic INT8 quantization to V4 model on CPU...")
 
 
198
  self.model = torch.quantization.quantize_dynamic(
199
  self.model, {torch.nn.Linear}, dtype=torch.qint8
200
  )
@@ -219,13 +237,17 @@ class StructuredSummarizer:
219
  # Wrap the HF model + tokenizer in an Outlines Transformers model
220
  if OUTLINES_AVAILABLE:
221
  try:
222
- self.outlines_model = outlines_models.Transformers(self.model, self.tokenizer)
 
 
223
  logger.info("βœ… Outlines model wrapper initialized for V4")
224
  except Exception as e:
225
  logger.error(f"❌ Failed to initialize Outlines wrapper: {e}")
226
  self.outlines_model = None
227
  else:
228
- logger.warning("⚠️ Outlines not available - V4 JSON streaming endpoints will be disabled")
 
 
229
  self.outlines_model = None
230
 
231
  except Exception as e:
@@ -251,17 +273,23 @@ class StructuredSummarizer:
251
  logger.error(f"❌ V4 model warmup failed: {e}")
252
 
253
  # Also warm up Outlines JSON generation
254
- if OUTLINES_AVAILABLE and self.outlines_model is not None and outlines_generate is not None:
 
 
 
 
255
  try:
256
  # Use outlines.generate.json(model, schema) pattern
257
- json_generator = outlines_generate.json(self.outlines_model, StructuredSummary)
258
-
 
 
259
  # Try to call it with a simple prompt
260
  result = json_generator("Warmup text for Outlines structured summary.")
261
  # Consume the generator if it's a generator
262
- if hasattr(result, '__iter__') and not isinstance(result, str):
263
  _ = list(result)[:1] # Just consume first item for warmup
264
-
265
  logger.info("βœ… V4 Outlines JSON warmup successful")
266
  except Exception as e:
267
  logger.warning(f"⚠️ V4 Outlines JSON warmup failed: {e}")
@@ -359,7 +387,7 @@ Rules:
359
  }
360
  return style_prompts.get(style, style_prompts["executive"])
361
 
362
- def _empty_state(self) -> Dict[str, Any]:
363
  """Initial empty structured state that patches will build up."""
364
  return {
365
  "title": None,
@@ -370,7 +398,7 @@ Rules:
370
  "read_time_min": None,
371
  }
372
 
373
- def _apply_patch(self, state: Dict[str, Any], patch: Dict[str, Any]) -> bool:
374
  """
375
  Apply a single patch to the state.
376
  Returns True if this is a 'done' patch (signals logical completion).
@@ -396,8 +424,8 @@ Rules:
396
  def _fallback_fill_missing_fields(
397
  self,
398
  text: str,
399
- state: Dict[str, Any],
400
- ) -> Dict[str, Any]:
401
  """
402
  Fallback to fill missing fields when the model stopped early
403
  and did not provide title, main_summary, or read_time_min.
@@ -483,8 +511,8 @@ Rules:
483
  self,
484
  text: str,
485
  style: str = "executive",
486
- max_tokens: Optional[int] = None,
487
- ) -> AsyncGenerator[Dict[str, Any], None]:
488
  """
489
  Stream structured summarization using Phi-3.
490
 
@@ -533,7 +561,8 @@ Rules:
533
  "do_sample": True,
534
  "temperature": settings.v4_temperature,
535
  "top_p": 0.9,
536
- "pad_token_id": self.tokenizer.pad_token_id or self.tokenizer.eos_token_id,
 
537
  "eos_token_id": self.tokenizer.eos_token_id,
538
  }
539
 
@@ -582,8 +611,8 @@ Rules:
582
  self,
583
  text: str,
584
  style: str = "executive",
585
- max_tokens: Optional[int] = None,
586
- ) -> AsyncGenerator[Dict[str, Any], None]:
587
  """
588
  Stream structured summarization using NDJSON patch-based protocol.
589
 
@@ -646,14 +675,15 @@ Rules:
646
  "streamer": streamer,
647
  "max_new_tokens": max_new_tokens,
648
  "do_sample": False,
649
- "pad_token_id": self.tokenizer.pad_token_id or self.tokenizer.eos_token_id,
 
650
  "eos_token_id": self.tokenizer.eos_token_id,
651
  }
652
 
653
  # DEBUG: Log generation config
654
- logger.info(f"πŸŽ›οΈ Generation config:")
655
  logger.info(f" max_new_tokens: {max_new_tokens}")
656
- logger.info(f" do_sample: False (greedy decoding for speed)")
657
  logger.info(f" eos_token_id: {self.tokenizer.eos_token_id}")
658
  logger.info(f" pad_token_id: {gen_kwargs['pad_token_id']}")
659
 
@@ -687,7 +717,9 @@ Rules:
687
  continue
688
 
689
  # DEBUG: Log every line BEFORE filtering
690
- logger.info(f"πŸ“„ Raw line (at token #{token_count}): {line[:100]}...")
 
 
691
 
692
  # Heuristic: skip anything that clearly isn't a JSON patch object
693
  # This filters out lines like "#include <bits/stdc++.h>" or random prose.
@@ -701,16 +733,20 @@ Rules:
701
  patch = None
702
  try:
703
  patch = json.loads(line)
704
-
705
  # Log each valid patch received from model
706
  op = patch.get("op")
707
  if op == "done":
708
  logger.info("βœ… Model emitted done patch")
709
  elif op == "set":
710
- logger.info(f"πŸ“ Model set: {patch.get('field')} = {str(patch.get('value'))[:50]}...")
 
 
711
  elif op == "append":
712
- logger.info(f"βž• Model append: {patch.get('field')} += {str(patch.get('value'))[:50]}...")
713
-
 
 
714
  except json.JSONDecodeError as e:
715
  logger.warning(
716
  f"Failed to parse NDJSON line: {line[:150]}... Error: {e}"
@@ -722,54 +758,72 @@ Rules:
722
  brace_count = 0
723
  end_pos = -1
724
  for i, char in enumerate(line):
725
- if char == '{':
726
  brace_count += 1
727
- elif char == '}':
728
  brace_count -= 1
729
  if brace_count == 0:
730
  end_pos = i + 1
731
  break
732
-
733
  if end_pos > 0:
734
  # Found a complete JSON object, try parsing just that part
735
  try:
736
  patch = json.loads(line[:end_pos])
737
- logger.info(f"βœ… Extracted valid JSON from incomplete line")
 
 
738
  except:
739
  pass
740
-
741
  # Strategy 2: If still failed, try to fix common quote issues
742
  if patch is None and '"value":"' in line:
743
  # Try to escape unescaped quotes in the value field
744
  import re
 
745
  # Simple heuristic: if we see a pattern like "value":"...text with 'quote'..."
746
  # try to escape the inner quotes
747
  def try_fix_quotes(text):
748
  # Try to find and close the value string properly
749
- match = re.match(r'(\{"op":"[^"]+","field":"[^"]+","value":")(.*?)(.*)$', text)
 
 
 
750
  if match:
751
  prefix = match.group(1)
752
  value_content = match.group(2)
753
  rest = match.group(3)
754
  # Escape any unescaped quotes in the value
755
- value_content = value_content.replace('\\"', '__TEMP__')
756
- value_content = value_content.replace('"', '\\"')
757
- value_content = value_content.replace('__TEMP__', '\\"')
 
 
 
 
 
 
758
  # Try to reconstruct: prefix + escaped_value + "}"
759
  if rest.startswith('"}'):
760
  try:
761
- return json.loads(prefix + value_content + rest)
 
 
762
  except:
763
  pass
764
  return None
765
-
766
  repaired = try_fix_quotes(line)
767
  if repaired:
768
  patch = repaired
769
- logger.info(f"βœ… Repaired JSON by escaping quotes")
 
 
770
  except Exception as repair_error:
771
- logger.debug(f"JSON repair attempt failed: {repair_error}")
772
-
 
 
773
  if patch is None:
774
  continue
775
 
@@ -824,9 +878,13 @@ Rules:
824
  "tokens_used": token_count,
825
  }
826
  except json.JSONDecodeError:
827
- logger.warning(f"⚠️ Could not parse remaining buffer as JSON: {buffer_cleaned[:100]}")
 
 
828
  else:
829
- logger.warning(f"πŸ—‘οΈ Unparsed buffer remaining (not JSON): {repr(buffer[:200])}")
 
 
830
  else:
831
  logger.info("βœ… Buffer was fully consumed (no partial lines)")
832
 
@@ -837,7 +895,13 @@ Rules:
837
 
838
  # If the model never emitted {"op":"done"} OR left required fields missing,
839
  # run a fallback to fill the gaps and emit synthetic patch events.
840
- required_fields = ["title", "main_summary", "category", "sentiment", "read_time_min"]
 
 
 
 
 
 
841
  missing_required = [f for f in required_fields if state.get(f) is None]
842
 
843
  if missing_required:
@@ -921,13 +985,20 @@ Rules:
921
  logger.error("❌ Outlines model not available for V4")
922
  # Provide detailed error information
923
  if not OUTLINES_AVAILABLE:
924
- error_msg = "Outlines library not installed. Please install outlines>=0.0.34."
 
 
925
  elif not self.model or not self.tokenizer:
926
- error_msg = "Base V4 model not loaded. Outlines wrapper cannot be created."
 
 
927
  else:
928
  error_msg = "Outlines model wrapper initialization failed. Check server logs for details."
929
-
930
- error_obj = {"error": "V4 Outlines model not available", "detail": error_msg}
 
 
 
931
  yield json.dumps(error_obj)
932
  return
933
 
@@ -942,7 +1013,9 @@ Rules:
942
  # Truncate text to prevent token overflow (reuse your existing max_chars idea)
943
  max_chars = 10000
944
  if len(text) > max_chars:
945
- logger.warning(f"Truncating input text from {len(text)} to {max_chars} chars for V4 JSON streaming.")
 
 
946
  text = text[:max_chars]
947
 
948
  # Build a compact prompt; Outlines will handle the schema, so no huge system prompt needed
@@ -963,7 +1036,9 @@ Rules:
963
  try:
964
  # Check if Outlines is available
965
  if not OUTLINES_AVAILABLE or outlines_generate is None:
966
- error_obj = {"error": "Outlines library not available. Please install outlines>=0.0.34."}
 
 
967
  yield json.dumps(error_obj)
968
  return
969
 
@@ -971,12 +1046,14 @@ Rules:
971
 
972
  # Create an Outlines generator bound to the StructuredSummary schema
973
  # Modern Outlines API: outlines.generate.json(model, schema)
974
- json_generator = outlines_generate.json(self.outlines_model, StructuredSummary)
 
 
975
 
976
  # Call the generator with the prompt to get streaming tokens
977
  # The generator returns an iterable of string tokens
978
  token_iter = json_generator(prompt)
979
-
980
  # Stream tokens; each token is a string fragment of the final JSON object
981
  for token in token_iter:
982
  # Each `token` is a raw string fragment; just pass it through
@@ -986,7 +1063,9 @@ Rules:
986
  await asyncio.sleep(0)
987
 
988
  latency_ms = (time.time() - start_time) * 1000.0
989
- logger.info(f"βœ… V4 Outlines JSON streaming completed in {latency_ms:.2f}ms")
 
 
990
 
991
  except Exception as e:
992
  logger.exception("❌ V4 Outlines JSON streaming failed")
 
6
  import json
7
  import threading
8
  import time
9
+ from collections.abc import AsyncGenerator
10
+ from typing import Any
11
 
12
  from app.core.config import settings
13
  from app.core.logging import get_logger
 
21
 
22
  _original_getuser = getpass.getuser
23
 
24
+
25
  def _mock_getuser():
26
  """Mock getuser for HF Spaces compatibility."""
27
  try:
 
30
  # Fallback for containerized environments without proper user database
31
  return os.environ.get("USER", os.environ.get("USERNAME", "user"))
32
 
33
+
34
  getpass.getuser = _mock_getuser
35
 
36
  # Try to import transformers
 
61
 
62
  try:
63
  import outlines
64
+
65
  # Check what's available in outlines module
66
+ available_attrs = [attr for attr in dir(outlines) if not attr.startswith("_")]
67
  logger.info(f"Outlines module attributes: {available_attrs}")
68
+
69
  # Try to import models
70
  try:
71
  from outlines import models as outlines_models
72
  except ImportError:
73
  logger.warning("Could not import outlines.models")
74
  raise
75
+
76
  # Try to import generate module (for outlines.generate.json)
77
  try:
78
  from outlines import generate as outlines_generate
79
+
80
  logger.info("βœ… Found outlines.generate module")
81
  except ImportError as e:
82
  logger.warning(f"Could not import outlines.generate: {e}")
83
  outlines_generate = None
84
+
85
  if outlines_generate is None:
86
+ raise ImportError(
87
+ f"Could not import outlines.generate. Available in outlines: {available_attrs[:10]}..."
88
+ )
89
+
90
  OUTLINES_AVAILABLE = True
91
  logger.info("βœ… Outlines library imported successfully")
92
  except ImportError as e:
93
+ logger.warning(
94
+ f"Outlines library not available: {e}. V4 JSON streaming endpoints will be disabled."
95
+ )
96
  except Exception as e:
97
+ logger.warning(
98
+ f"Error importing Outlines library: {e}. V4 JSON streaming endpoints will be disabled."
99
+ )
100
 
101
 
102
  class StructuredSummary(BaseModel):
103
  """Pydantic schema for structured summary output."""
104
+
105
  title: str
106
  main_summary: str
107
  key_points: list[str]
 
115
 
116
  def __init__(self):
117
  """Initialize the Qwen model and tokenizer with GPU/INT4 when possible."""
118
+ self.tokenizer: AutoTokenizer | None = None
119
+ self.model: AutoModelForCausalLM | None = None
120
  self.outlines_model = None # Outlines wrapper over the HF model
121
 
122
  if not TRANSFORMERS_AVAILABLE:
 
147
  # OR FP16 for speed (2-3x faster, uses more memory)
148
  # ------------------------------------------------------------------
149
  use_fp16_for_speed = getattr(settings, "v4_use_fp16_for_speed", False)
150
+
151
  if (
152
  use_cuda
153
  and not use_fp16_for_speed
154
  and getattr(settings, "v4_enable_quantization", True)
155
  and HAS_BITSANDBYTES
156
  ):
157
+ logger.info(
158
+ "Applying 4-bit NF4 quantization (bitsandbytes) to V4 model..."
159
+ )
160
  quant_config = BitsAndBytesConfig(
161
  load_in_4bit=True,
162
  bnb_4bit_compute_dtype=torch.bfloat16,
 
172
  trust_remote_code=True,
173
  )
174
  quantization_desc = "4-bit NF4 (bitsandbytes, GPU)"
175
+
176
  elif use_cuda and use_fp16_for_speed:
177
  # Use FP16 for 2-3x faster inference (uses ~2-3GB GPU memory)
178
+ logger.info(
179
+ "Loading V4 model in FP16 for maximum speed (2-3x faster than 4-bit)..."
180
+ )
181
  self.model = AutoModelForCausalLM.from_pretrained(
182
  settings.v4_model_id,
183
  dtype=torch.float16,
 
210
  # Optional dynamic INT8 quantization on CPU
211
  if getattr(settings, "v4_enable_quantization", True) and not use_cuda:
212
  try:
213
+ logger.info(
214
+ "Applying dynamic INT8 quantization to V4 model on CPU..."
215
+ )
216
  self.model = torch.quantization.quantize_dynamic(
217
  self.model, {torch.nn.Linear}, dtype=torch.qint8
218
  )
 
237
  # Wrap the HF model + tokenizer in an Outlines Transformers model
238
  if OUTLINES_AVAILABLE:
239
  try:
240
+ self.outlines_model = outlines_models.Transformers(
241
+ self.model, self.tokenizer
242
+ )
243
  logger.info("βœ… Outlines model wrapper initialized for V4")
244
  except Exception as e:
245
  logger.error(f"❌ Failed to initialize Outlines wrapper: {e}")
246
  self.outlines_model = None
247
  else:
248
+ logger.warning(
249
+ "⚠️ Outlines not available - V4 JSON streaming endpoints will be disabled"
250
+ )
251
  self.outlines_model = None
252
 
253
  except Exception as e:
 
273
  logger.error(f"❌ V4 model warmup failed: {e}")
274
 
275
  # Also warm up Outlines JSON generation
276
+ if (
277
+ OUTLINES_AVAILABLE
278
+ and self.outlines_model is not None
279
+ and outlines_generate is not None
280
+ ):
281
  try:
282
  # Use outlines.generate.json(model, schema) pattern
283
+ json_generator = outlines_generate.json(
284
+ self.outlines_model, StructuredSummary
285
+ )
286
+
287
  # Try to call it with a simple prompt
288
  result = json_generator("Warmup text for Outlines structured summary.")
289
  # Consume the generator if it's a generator
290
+ if hasattr(result, "__iter__") and not isinstance(result, str):
291
  _ = list(result)[:1] # Just consume first item for warmup
292
+
293
  logger.info("βœ… V4 Outlines JSON warmup successful")
294
  except Exception as e:
295
  logger.warning(f"⚠️ V4 Outlines JSON warmup failed: {e}")
 
387
  }
388
  return style_prompts.get(style, style_prompts["executive"])
389
 
390
+ def _empty_state(self) -> dict[str, Any]:
391
  """Initial empty structured state that patches will build up."""
392
  return {
393
  "title": None,
 
398
  "read_time_min": None,
399
  }
400
 
401
+ def _apply_patch(self, state: dict[str, Any], patch: dict[str, Any]) -> bool:
402
  """
403
  Apply a single patch to the state.
404
  Returns True if this is a 'done' patch (signals logical completion).
 
424
  def _fallback_fill_missing_fields(
425
  self,
426
  text: str,
427
+ state: dict[str, Any],
428
+ ) -> dict[str, Any]:
429
  """
430
  Fallback to fill missing fields when the model stopped early
431
  and did not provide title, main_summary, or read_time_min.
 
511
  self,
512
  text: str,
513
  style: str = "executive",
514
+ max_tokens: int | None = None,
515
+ ) -> AsyncGenerator[dict[str, Any], None]:
516
  """
517
  Stream structured summarization using Phi-3.
518
 
 
561
  "do_sample": True,
562
  "temperature": settings.v4_temperature,
563
  "top_p": 0.9,
564
+ "pad_token_id": self.tokenizer.pad_token_id
565
+ or self.tokenizer.eos_token_id,
566
  "eos_token_id": self.tokenizer.eos_token_id,
567
  }
568
 
 
611
  self,
612
  text: str,
613
  style: str = "executive",
614
+ max_tokens: int | None = None,
615
+ ) -> AsyncGenerator[dict[str, Any], None]:
616
  """
617
  Stream structured summarization using NDJSON patch-based protocol.
618
 
 
675
  "streamer": streamer,
676
  "max_new_tokens": max_new_tokens,
677
  "do_sample": False,
678
+ "pad_token_id": self.tokenizer.pad_token_id
679
+ or self.tokenizer.eos_token_id,
680
  "eos_token_id": self.tokenizer.eos_token_id,
681
  }
682
 
683
  # DEBUG: Log generation config
684
+ logger.info("πŸŽ›οΈ Generation config:")
685
  logger.info(f" max_new_tokens: {max_new_tokens}")
686
+ logger.info(" do_sample: False (greedy decoding for speed)")
687
  logger.info(f" eos_token_id: {self.tokenizer.eos_token_id}")
688
  logger.info(f" pad_token_id: {gen_kwargs['pad_token_id']}")
689
 
 
717
  continue
718
 
719
  # DEBUG: Log every line BEFORE filtering
720
+ logger.info(
721
+ f"πŸ“„ Raw line (at token #{token_count}): {line[:100]}..."
722
+ )
723
 
724
  # Heuristic: skip anything that clearly isn't a JSON patch object
725
  # This filters out lines like "#include <bits/stdc++.h>" or random prose.
 
733
  patch = None
734
  try:
735
  patch = json.loads(line)
736
+
737
  # Log each valid patch received from model
738
  op = patch.get("op")
739
  if op == "done":
740
  logger.info("βœ… Model emitted done patch")
741
  elif op == "set":
742
+ logger.info(
743
+ f"πŸ“ Model set: {patch.get('field')} = {str(patch.get('value'))[:50]}..."
744
+ )
745
  elif op == "append":
746
+ logger.info(
747
+ f"βž• Model append: {patch.get('field')} += {str(patch.get('value'))[:50]}..."
748
+ )
749
+
750
  except json.JSONDecodeError as e:
751
  logger.warning(
752
  f"Failed to parse NDJSON line: {line[:150]}... Error: {e}"
 
758
  brace_count = 0
759
  end_pos = -1
760
  for i, char in enumerate(line):
761
+ if char == "{":
762
  brace_count += 1
763
+ elif char == "}":
764
  brace_count -= 1
765
  if brace_count == 0:
766
  end_pos = i + 1
767
  break
768
+
769
  if end_pos > 0:
770
  # Found a complete JSON object, try parsing just that part
771
  try:
772
  patch = json.loads(line[:end_pos])
773
+ logger.info(
774
+ "βœ… Extracted valid JSON from incomplete line"
775
+ )
776
  except:
777
  pass
778
+
779
  # Strategy 2: If still failed, try to fix common quote issues
780
  if patch is None and '"value":"' in line:
781
  # Try to escape unescaped quotes in the value field
782
  import re
783
+
784
  # Simple heuristic: if we see a pattern like "value":"...text with 'quote'..."
785
  # try to escape the inner quotes
786
  def try_fix_quotes(text):
787
  # Try to find and close the value string properly
788
+ match = re.match(
789
+ r'(\{"op":"[^"]+","field":"[^"]+","value":")(.*?)(.*)$',
790
+ text,
791
+ )
792
  if match:
793
  prefix = match.group(1)
794
  value_content = match.group(2)
795
  rest = match.group(3)
796
  # Escape any unescaped quotes in the value
797
+ value_content = value_content.replace(
798
+ '\\"', "__TEMP__"
799
+ )
800
+ value_content = value_content.replace(
801
+ '"', '\\"'
802
+ )
803
+ value_content = value_content.replace(
804
+ "__TEMP__", '\\"'
805
+ )
806
  # Try to reconstruct: prefix + escaped_value + "}"
807
  if rest.startswith('"}'):
808
  try:
809
+ return json.loads(
810
+ prefix + value_content + rest
811
+ )
812
  except:
813
  pass
814
  return None
815
+
816
  repaired = try_fix_quotes(line)
817
  if repaired:
818
  patch = repaired
819
+ logger.info(
820
+ "βœ… Repaired JSON by escaping quotes"
821
+ )
822
  except Exception as repair_error:
823
+ logger.debug(
824
+ f"JSON repair attempt failed: {repair_error}"
825
+ )
826
+
827
  if patch is None:
828
  continue
829
 
 
878
  "tokens_used": token_count,
879
  }
880
  except json.JSONDecodeError:
881
+ logger.warning(
882
+ f"⚠️ Could not parse remaining buffer as JSON: {buffer_cleaned[:100]}"
883
+ )
884
  else:
885
+ logger.warning(
886
+ f"πŸ—‘οΈ Unparsed buffer remaining (not JSON): {repr(buffer[:200])}"
887
+ )
888
  else:
889
  logger.info("βœ… Buffer was fully consumed (no partial lines)")
890
 
 
895
 
896
  # If the model never emitted {"op":"done"} OR left required fields missing,
897
  # run a fallback to fill the gaps and emit synthetic patch events.
898
+ required_fields = [
899
+ "title",
900
+ "main_summary",
901
+ "category",
902
+ "sentiment",
903
+ "read_time_min",
904
+ ]
905
  missing_required = [f for f in required_fields if state.get(f) is None]
906
 
907
  if missing_required:
 
985
  logger.error("❌ Outlines model not available for V4")
986
  # Provide detailed error information
987
  if not OUTLINES_AVAILABLE:
988
+ error_msg = (
989
+ "Outlines library not installed. Please install outlines>=0.0.34."
990
+ )
991
  elif not self.model or not self.tokenizer:
992
+ error_msg = (
993
+ "Base V4 model not loaded. Outlines wrapper cannot be created."
994
+ )
995
  else:
996
  error_msg = "Outlines model wrapper initialization failed. Check server logs for details."
997
+
998
+ error_obj = {
999
+ "error": "V4 Outlines model not available",
1000
+ "detail": error_msg,
1001
+ }
1002
  yield json.dumps(error_obj)
1003
  return
1004
 
 
1013
  # Truncate text to prevent token overflow (reuse your existing max_chars idea)
1014
  max_chars = 10000
1015
  if len(text) > max_chars:
1016
+ logger.warning(
1017
+ f"Truncating input text from {len(text)} to {max_chars} chars for V4 JSON streaming."
1018
+ )
1019
  text = text[:max_chars]
1020
 
1021
  # Build a compact prompt; Outlines will handle the schema, so no huge system prompt needed
 
1036
  try:
1037
  # Check if Outlines is available
1038
  if not OUTLINES_AVAILABLE or outlines_generate is None:
1039
+ error_obj = {
1040
+ "error": "Outlines library not available. Please install outlines>=0.0.34."
1041
+ }
1042
  yield json.dumps(error_obj)
1043
  return
1044
 
 
1046
 
1047
  # Create an Outlines generator bound to the StructuredSummary schema
1048
  # Modern Outlines API: outlines.generate.json(model, schema)
1049
+ json_generator = outlines_generate.json(
1050
+ self.outlines_model, StructuredSummary
1051
+ )
1052
 
1053
  # Call the generator with the prompt to get streaming tokens
1054
  # The generator returns an iterable of string tokens
1055
  token_iter = json_generator(prompt)
1056
+
1057
  # Stream tokens; each token is a string fragment of the final JSON object
1058
  for token in token_iter:
1059
  # Each `token` is a raw string fragment; just pass it through
 
1063
  await asyncio.sleep(0)
1064
 
1065
  latency_ms = (time.time() - start_time) * 1000.0
1066
+ logger.info(
1067
+ f"βœ… V4 Outlines JSON streaming completed in {latency_ms:.2f}ms"
1068
+ )
1069
 
1070
  except Exception as e:
1071
  logger.exception("❌ V4 Outlines JSON streaming failed")
app/services/summarizer.py CHANGED
@@ -4,7 +4,8 @@ Ollama service integration for text summarization.
4
 
5
  import json
6
  import time
7
- from typing import Any, AsyncGenerator, Dict
 
8
  from urllib.parse import urljoin
9
 
10
  import httpx
@@ -50,7 +51,7 @@ class OllamaService:
50
  text: str,
51
  max_tokens: int = 100,
52
  prompt: str = "Summarize concisely:",
53
- ) -> Dict[str, Any]:
54
  """
55
  Summarize text using Ollama.
56
  Raises httpx.HTTPError (and subclasses) on failure.
@@ -136,13 +137,13 @@ class OllamaService:
136
  text: str,
137
  max_tokens: int = 100,
138
  prompt: str = "Summarize concisely:",
139
- ) -> AsyncGenerator[Dict[str, Any], None]:
140
  """
141
  Stream text summarization using Ollama.
142
  Yields chunks as they arrive from Ollama.
143
  Raises httpx.HTTPError (and subclasses) on failure.
144
  """
145
- start_time = time.time()
146
 
147
  # Optimized timeout: base + 3s per extra 1000 chars (cap 90s)
148
  text_length = len(text)
@@ -274,7 +275,7 @@ class OllamaService:
274
  async with httpx.AsyncClient(timeout=5.0) as client:
275
  resp = await client.get(tags_url)
276
  resp.raise_for_status()
277
- tags = resp.json()
278
 
279
  # If you want to *require* the model to exist, uncomment below:
280
  # available = {m.get("name") for m in tags.get("models", []) if isinstance(m, dict)}
 
4
 
5
  import json
6
  import time
7
+ from collections.abc import AsyncGenerator
8
+ from typing import Any
9
  from urllib.parse import urljoin
10
 
11
  import httpx
 
51
  text: str,
52
  max_tokens: int = 100,
53
  prompt: str = "Summarize concisely:",
54
+ ) -> dict[str, Any]:
55
  """
56
  Summarize text using Ollama.
57
  Raises httpx.HTTPError (and subclasses) on failure.
 
137
  text: str,
138
  max_tokens: int = 100,
139
  prompt: str = "Summarize concisely:",
140
+ ) -> AsyncGenerator[dict[str, Any], None]:
141
  """
142
  Stream text summarization using Ollama.
143
  Yields chunks as they arrive from Ollama.
144
  Raises httpx.HTTPError (and subclasses) on failure.
145
  """
146
+ time.time()
147
 
148
  # Optimized timeout: base + 3s per extra 1000 chars (cap 90s)
149
  text_length = len(text)
 
275
  async with httpx.AsyncClient(timeout=5.0) as client:
276
  resp = await client.get(tags_url)
277
  resp.raise_for_status()
278
+ resp.json()
279
 
280
  # If you want to *require* the model to exist, uncomment below:
281
  # available = {m.get("name") for m in tags.get("models", []) if isinstance(m, dict)}
app/services/transformers_summarizer.py CHANGED
@@ -4,7 +4,8 @@ Transformers service for fast text summarization using Hugging Face models.
4
 
5
  import asyncio
6
  import time
7
- from typing import Any, AsyncGenerator, Dict, Optional
 
8
 
9
  from app.core.logging import get_logger
10
 
@@ -27,7 +28,7 @@ class TransformersSummarizer:
27
 
28
  def __init__(self):
29
  """Initialize the Transformers pipeline with distilbart model."""
30
- self.summarizer: Optional[Any] = None
31
 
32
  if not TRANSFORMERS_AVAILABLE:
33
  logger.warning(
@@ -39,7 +40,9 @@ class TransformersSummarizer:
39
 
40
  try:
41
  self.summarizer = pipeline(
42
- "summarization", model="sshleifer/distilbart-cnn-6-6", device=-1 # CPU
 
 
43
  )
44
  logger.info("βœ… Transformers pipeline initialized successfully")
45
  except Exception as e:
@@ -77,7 +80,7 @@ class TransformersSummarizer:
77
  text: str,
78
  max_length: int = 130,
79
  min_length: int = 30,
80
- ) -> AsyncGenerator[Dict[str, Any], None]:
81
  """
82
  Stream text summarization results word-by-word.
83
 
 
4
 
5
  import asyncio
6
  import time
7
+ from collections.abc import AsyncGenerator
8
+ from typing import Any
9
 
10
  from app.core.logging import get_logger
11
 
 
28
 
29
  def __init__(self):
30
  """Initialize the Transformers pipeline with distilbart model."""
31
+ self.summarizer: Any | None = None
32
 
33
  if not TRANSFORMERS_AVAILABLE:
34
  logger.warning(
 
40
 
41
  try:
42
  self.summarizer = pipeline(
43
+ "summarization",
44
+ model="sshleifer/distilbart-cnn-6-6",
45
+ device=-1, # CPU
46
  )
47
  logger.info("βœ… Transformers pipeline initialized successfully")
48
  except Exception as e:
 
80
  text: str,
81
  max_length: int = 130,
82
  min_length: int = 30,
83
+ ) -> AsyncGenerator[dict[str, Any], None]:
84
  """
85
  Stream text summarization results word-by-word.
86
 
requirements.txt CHANGED
@@ -29,9 +29,7 @@ pytest-cov>=4.0.0,<5.0.0
29
  pytest-mock>=3.10.0,<4.0.0
30
 
31
  # Development tools
32
- black>=22.0.0,<24.0.0
33
- isort>=5.10.0,<6.0.0
34
- flake8>=5.0.0,<7.0.0
35
 
36
  # Optional: for better performance
37
  uvloop>=0.17.0,<0.20.0
 
29
  pytest-mock>=3.10.0,<4.0.0
30
 
31
  # Development tools
32
+ ruff>=0.1.0
 
 
33
 
34
  # Optional: for better performance
35
  uvloop>=0.17.0,<0.20.0
ruff.toml ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ruff configuration file
2
+ # Fast, Python linter and formatter written in Rust
3
+
4
+ # Line length (Black-compatible default)
5
+ line-length = 88
6
+
7
+ # Target Python version
8
+ target-version = "py310"
9
+
10
+ # Exclude patterns
11
+ exclude = [
12
+ "__pycache__",
13
+ "*.pyc",
14
+ ".git",
15
+ ".venv",
16
+ "venv",
17
+ "htmlcov",
18
+ ".pytest_cache",
19
+ "dist",
20
+ "build",
21
+ ]
22
+
23
+ # Linter configuration
24
+ [lint]
25
+ # Enable rule sets
26
+ select = [
27
+ "E", # pycodestyle errors
28
+ "W", # pycodestyle warnings
29
+ "F", # pyflakes
30
+ "I", # isort (import sorting)
31
+ "UP", # pyupgrade
32
+ "B", # flake8-bugbear
33
+ "C4", # flake8-comprehensions
34
+ "SIM", # flake8-simplify
35
+ ]
36
+
37
+ # Ignore specific rules
38
+ ignore = [
39
+ "E501", # Line too long (handled by formatter)
40
+ "B008", # Do not perform function calls in argument defaults (common in FastAPI)
41
+ "C901", # Too complex (may be too strict for this project)
42
+ "B904", # Allow raising exceptions without 'from' in error handlers (FastAPI pattern)
43
+ ]
44
+
45
+ # Per-file ignores
46
+ [lint.per-file-ignores]
47
+ "tests/*" = ["S101"] # Use of assert in tests is fine
48
+ "app/services/structured_summarizer.py" = ["E402", "E722"] # Intentional imports after patch, bare except for JSON parsing
49
+ "app/services/summarizer.py" = ["SIM117"] # Nested async with necessary (client.stream depends on client context)
50
+
51
+ # Import sorting configuration (isort-compatible)
52
+ [lint.isort]
53
+ known-first-party = ["app"]
54
+
55
+ # Format configuration
56
+ [format]
57
+ quote-style = "double"
58
+ indent-style = "space"
59
+ skip-magic-trailing-comma = false
60
+ line-ending = "auto"
61
+
tests/conftest.py CHANGED
@@ -3,7 +3,7 @@ Test configuration and fixtures for the text summarizer backend.
3
  """
4
 
5
  import asyncio
6
- from typing import AsyncGenerator, Generator
7
 
8
  import pytest
9
  from httpx import AsyncClient
@@ -38,12 +38,12 @@ async def async_client() -> AsyncGenerator[AsyncClient, None]:
38
  def sample_text() -> str:
39
  """Sample text for testing summarization."""
40
  return """
41
- Artificial intelligence (AI) is intelligence demonstrated by machines,
42
- in contrast to the natural intelligence displayed by humans and animals.
43
- Leading AI textbooks define the field as the study of "intelligent agents":
44
- any device that perceives its environment and takes actions that maximize
45
- its chance of successfully achieving its goals. The term "artificial intelligence"
46
- is often used to describe machines that mimic "cognitive" functions that humans
47
  associate with the human mind, such as "learning" and "problem solving".
48
  """
49
 
 
3
  """
4
 
5
  import asyncio
6
+ from collections.abc import AsyncGenerator, Generator
7
 
8
  import pytest
9
  from httpx import AsyncClient
 
38
  def sample_text() -> str:
39
  """Sample text for testing summarization."""
40
  return """
41
+ Artificial intelligence (AI) is intelligence demonstrated by machines,
42
+ in contrast to the natural intelligence displayed by humans and animals.
43
+ Leading AI textbooks define the field as the study of "intelligent agents":
44
+ any device that perceives its environment and takes actions that maximize
45
+ its chance of successfully achieving its goals. The term "artificial intelligence"
46
+ is often used to describe machines that mimic "cognitive" functions that humans
47
  associate with the human mind, such as "learning" and "problem solving".
48
  """
49
 
tests/test_502_prevention.py CHANGED
@@ -44,7 +44,7 @@ class Test502BadGatewayPrevention:
44
  with patch("httpx.AsyncClient") as mock_client:
45
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
46
 
47
- resp = client.post(
48
  "/api/v1/summarize/", json={"text": large_text, "max_tokens": 256}
49
  )
50
 
@@ -64,7 +64,7 @@ class Test502BadGatewayPrevention:
64
  with patch("httpx.AsyncClient") as mock_client:
65
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
66
 
67
- resp = client.post(
68
  "/api/v1/summarize/", json={"text": very_large_text, "max_tokens": 256}
69
  )
70
 
@@ -83,7 +83,7 @@ class Test502BadGatewayPrevention:
83
  with patch("httpx.AsyncClient") as mock_client:
84
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
85
 
86
- resp = client.post(
87
  "/api/v1/summarize/", json={"text": small_text, "max_tokens": 256}
88
  )
89
 
@@ -100,7 +100,7 @@ class Test502BadGatewayPrevention:
100
  with patch("httpx.AsyncClient") as mock_client:
101
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
102
 
103
- resp = client.post(
104
  "/api/v1/summarize/", json={"text": medium_text, "max_tokens": 256}
105
  )
106
 
@@ -217,7 +217,7 @@ class Test502BadGatewayPrevention:
217
  post_result=StubAsyncResponse()
218
  )
219
 
220
- resp = client.post(
221
  "/api/v1/summarize/", json={"text": test_text, "max_tokens": 256}
222
  )
223
 
@@ -225,6 +225,6 @@ class Test502BadGatewayPrevention:
225
  mock_client.assert_called_once()
226
  call_args = mock_client.call_args
227
  actual_timeout = call_args[1]["timeout"]
228
- assert (
229
- actual_timeout == expected_timeout
230
- ), f"Text length {text_length} should have timeout {expected_timeout}, got {actual_timeout}"
 
44
  with patch("httpx.AsyncClient") as mock_client:
45
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
46
 
47
+ client.post(
48
  "/api/v1/summarize/", json={"text": large_text, "max_tokens": 256}
49
  )
50
 
 
64
  with patch("httpx.AsyncClient") as mock_client:
65
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
66
 
67
+ client.post(
68
  "/api/v1/summarize/", json={"text": very_large_text, "max_tokens": 256}
69
  )
70
 
 
83
  with patch("httpx.AsyncClient") as mock_client:
84
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
85
 
86
+ client.post(
87
  "/api/v1/summarize/", json={"text": small_text, "max_tokens": 256}
88
  )
89
 
 
100
  with patch("httpx.AsyncClient") as mock_client:
101
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
102
 
103
+ client.post(
104
  "/api/v1/summarize/", json={"text": medium_text, "max_tokens": 256}
105
  )
106
 
 
217
  post_result=StubAsyncResponse()
218
  )
219
 
220
+ client.post(
221
  "/api/v1/summarize/", json={"text": test_text, "max_tokens": 256}
222
  )
223
 
 
225
  mock_client.assert_called_once()
226
  call_args = mock_client.call_args
227
  actual_timeout = call_args[1]["timeout"]
228
+ assert actual_timeout == expected_timeout, (
229
+ f"Text length {text_length} should have timeout {expected_timeout}, got {actual_timeout}"
230
+ )
tests/test_api.py CHANGED
@@ -92,9 +92,7 @@ def test_summarize_endpoint_large_text_handling():
92
  with patch("httpx.AsyncClient") as mock_client:
93
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
94
 
95
- resp = client.post(
96
- "/api/v1/summarize/", json={"text": large_text, "max_tokens": 256}
97
- )
98
 
99
  # Verify the client was called with extended timeout
100
  mock_client.assert_called_once()
 
92
  with patch("httpx.AsyncClient") as mock_client:
93
  mock_client.return_value = StubAsyncClient(post_result=StubAsyncResponse())
94
 
95
+ client.post("/api/v1/summarize/", json={"text": large_text, "max_tokens": 256})
 
 
96
 
97
  # Verify the client was called with extended timeout
98
  mock_client.assert_called_once()
tests/test_cache.py CHANGED
@@ -4,8 +4,6 @@ Tests for the cache service.
4
 
5
  import time
6
 
7
- import pytest
8
-
9
  from app.core.cache import SimpleCache
10
 
11
 
 
4
 
5
  import time
6
 
 
 
7
  from app.core.cache import SimpleCache
8
 
9
 
tests/test_config.py CHANGED
@@ -2,8 +2,6 @@
2
  Tests for configuration management.
3
  """
4
 
5
- import os
6
-
7
  import pytest
8
 
9
  from app.core.config import Settings, settings
@@ -80,7 +78,7 @@ class TestSettings:
80
  monkeypatch.setenv("API_KEY_ENABLED", "invalid")
81
  monkeypatch.setenv("RATE_LIMIT_ENABLED", "maybe")
82
 
83
- with pytest.raises(Exception): # Pydantic validation error
84
  Settings()
85
 
86
  def test_invalid_integer_environment_variables(self, monkeypatch):
@@ -89,7 +87,7 @@ class TestSettings:
89
  monkeypatch.setenv("SERVER_PORT", "not-a-number")
90
  monkeypatch.setenv("MAX_TEXT_LENGTH", "abc")
91
 
92
- with pytest.raises(Exception): # Pydantic validation error
93
  Settings()
94
 
95
  def test_negative_integer_environment_variables(self, monkeypatch):
@@ -98,7 +96,7 @@ class TestSettings:
98
  monkeypatch.setenv("SERVER_PORT", "-1")
99
  monkeypatch.setenv("MAX_TEXT_LENGTH", "-1000")
100
 
101
- with pytest.raises(Exception): # Pydantic validation error
102
  Settings()
103
 
104
  def test_settings_validation(self):
 
2
  Tests for configuration management.
3
  """
4
 
 
 
5
  import pytest
6
 
7
  from app.core.config import Settings, settings
 
78
  monkeypatch.setenv("API_KEY_ENABLED", "invalid")
79
  monkeypatch.setenv("RATE_LIMIT_ENABLED", "maybe")
80
 
81
+ with pytest.raises(ValueError): # Pydantic validation error
82
  Settings()
83
 
84
  def test_invalid_integer_environment_variables(self, monkeypatch):
 
87
  monkeypatch.setenv("SERVER_PORT", "not-a-number")
88
  monkeypatch.setenv("MAX_TEXT_LENGTH", "abc")
89
 
90
+ with pytest.raises(ValueError): # Pydantic validation error
91
  Settings()
92
 
93
  def test_negative_integer_environment_variables(self, monkeypatch):
 
96
  monkeypatch.setenv("SERVER_PORT", "-1")
97
  monkeypatch.setenv("MAX_TEXT_LENGTH", "-1000")
98
 
99
+ with pytest.raises(ValueError): # Pydantic validation error
100
  Settings()
101
 
102
  def test_settings_validation(self):
tests/test_errors.py CHANGED
@@ -2,7 +2,7 @@
2
  Tests for error handling functionality.
3
  """
4
 
5
- from unittest.mock import Mock, patch
6
 
7
  import pytest
8
  from fastapi import FastAPI, Request
 
2
  Tests for error handling functionality.
3
  """
4
 
5
+ from unittest.mock import Mock
6
 
7
  import pytest
8
  from fastapi import FastAPI, Request
tests/test_hf_streaming.py CHANGED
@@ -2,13 +2,14 @@
2
  Tests for HuggingFace streaming service.
3
  """
4
 
5
- import asyncio
6
- from unittest.mock import AsyncMock, MagicMock, patch
7
 
8
  import pytest
9
 
10
- from app.services.hf_streaming_summarizer import (HFStreamingSummarizer,
11
- hf_streaming_service)
 
 
12
 
13
 
14
  class TestHFStreamingSummarizer:
@@ -106,9 +107,9 @@ class TestHFStreamingSummarizer:
106
 
107
  # Test that the method exists and handles the case when torch is not available
108
  try:
109
- dtype = service._get_torch_dtype()
110
  # If it doesn't raise an exception, that's good enough for this test
111
- assert dtype is not None or True # Always pass since torch not available
112
  except NameError:
113
  # Expected when torch is not available
114
  pass
@@ -119,9 +120,9 @@ class TestHFStreamingSummarizer:
119
 
120
  # Test that the method exists and handles the case when torch is not available
121
  try:
122
- dtype = service._get_torch_dtype()
123
  # If it doesn't raise an exception, that's good enough for this test
124
- assert dtype is not None or True # Always pass since torch not available
125
  except NameError:
126
  # Expected when torch is not available
127
  pass
 
2
  Tests for HuggingFace streaming service.
3
  """
4
 
5
+ from unittest.mock import MagicMock, patch
 
6
 
7
  import pytest
8
 
9
+ from app.services.hf_streaming_summarizer import (
10
+ HFStreamingSummarizer,
11
+ hf_streaming_service,
12
+ )
13
 
14
 
15
  class TestHFStreamingSummarizer:
 
107
 
108
  # Test that the method exists and handles the case when torch is not available
109
  try:
110
+ service._get_torch_dtype()
111
  # If it doesn't raise an exception, that's good enough for this test
112
+ assert True # Always pass since torch not available
113
  except NameError:
114
  # Expected when torch is not available
115
  pass
 
120
 
121
  # Test that the method exists and handles the case when torch is not available
122
  try:
123
+ service._get_torch_dtype()
124
  # If it doesn't raise an exception, that's good enough for this test
125
+ assert True # Always pass since torch not available
126
  except NameError:
127
  # Expected when torch is not available
128
  pass
tests/test_hf_streaming_improvements.py CHANGED
@@ -2,12 +2,14 @@
2
  Tests for HuggingFace streaming summarizer improvements.
3
  """
4
 
5
- from unittest.mock import AsyncMock, MagicMock, patch
6
 
7
  import pytest
8
 
9
- from app.services.hf_streaming_summarizer import (HFStreamingSummarizer,
10
- _split_into_chunks)
 
 
11
 
12
 
13
  class TestSplitIntoChunks:
@@ -142,37 +144,37 @@ class TestHFStreamingSummarizerImprovements:
142
  mock_streamer = MagicMock()
143
  mock_streamer.__iter__ = MagicMock(return_value=iter(["test", "summary"]))
144
 
145
- with patch(
146
- "app.services.hf_streaming_summarizer.TextIteratorStreamer",
147
- return_value=mock_streamer,
 
 
 
148
  ):
149
- with patch(
150
- "app.services.hf_streaming_summarizer.settings"
151
- ) as mock_settings:
152
- mock_settings.hf_model_id = "test-model"
153
-
154
- results = []
155
- async for chunk in mock_summarizer._single_chunk_summarize(
156
- "Test text",
157
- max_new_tokens=80,
158
- temperature=0.3,
159
- top_p=0.9,
160
- prompt="Test prompt",
161
- ):
162
- results.append(chunk)
163
-
164
- # Should have content chunks + final done
165
- assert len(results) >= 2
166
-
167
- # Check that generation was called with correct parameters
168
- mock_summarizer.model.generate.assert_called_once()
169
- call_kwargs = mock_summarizer.model.generate.call_args[1]
170
-
171
- assert call_kwargs["max_new_tokens"] == 80
172
- assert call_kwargs["temperature"] == 0.3
173
- assert call_kwargs["top_p"] == 0.9
174
- assert call_kwargs["length_penalty"] == 1.0 # Should be neutral
175
- assert call_kwargs["min_new_tokens"] <= 50 # Should be conservative
176
 
177
  @pytest.mark.asyncio
178
  async def test_single_chunk_summarize_defaults(self, mock_summarizer):
@@ -186,32 +188,32 @@ class TestHFStreamingSummarizerImprovements:
186
  mock_streamer = MagicMock()
187
  mock_streamer.__iter__ = MagicMock(return_value=iter(["test", "summary"]))
188
 
189
- with patch(
190
- "app.services.hf_streaming_summarizer.TextIteratorStreamer",
191
- return_value=mock_streamer,
 
 
 
192
  ):
193
- with patch(
194
- "app.services.hf_streaming_summarizer.settings"
195
- ) as mock_settings:
196
- mock_settings.hf_model_id = "test-model"
197
-
198
- results = []
199
- async for chunk in mock_summarizer._single_chunk_summarize(
200
- "Test text",
201
- max_new_tokens=None,
202
- temperature=None,
203
- top_p=None,
204
- prompt="Test prompt",
205
- ):
206
- results.append(chunk)
207
-
208
- # Check that generation was called with correct defaults
209
- mock_summarizer.model.generate.assert_called_once()
210
- call_kwargs = mock_summarizer.model.generate.call_args[1]
211
-
212
- assert call_kwargs["max_new_tokens"] == 80 # Default
213
- assert call_kwargs["temperature"] == 0.3 # Default
214
- assert call_kwargs["top_p"] == 0.9 # Default
215
 
216
  @pytest.mark.asyncio
217
  async def test_recursive_summarization_error_handling(self, mock_summarizer):
@@ -310,26 +312,26 @@ class TestHFStreamingSummarizerIntegration:
310
  mock_streamer = MagicMock()
311
  mock_streamer.__iter__ = MagicMock(return_value=iter(["short", "summary"]))
312
 
313
- with patch(
314
- "app.services.hf_streaming_summarizer.TextIteratorStreamer",
315
- return_value=mock_streamer,
 
 
 
316
  ):
317
- with patch(
318
- "app.services.hf_streaming_summarizer.settings"
319
- ) as mock_settings:
320
- mock_settings.hf_model_id = "test-model"
321
- mock_settings.hf_temperature = 0.3
322
- mock_settings.hf_top_p = 0.9
323
-
324
- # Short text (<1500 chars)
325
- short_text = "This is a short text."
326
-
327
- results = []
328
- async for chunk in summarizer.summarize_text_stream(short_text):
329
- results.append(chunk)
330
-
331
- # Should have used normal flow (not recursive)
332
- assert len(results) >= 2
333
- assert results[0]["content"] == "short"
334
- assert results[1]["content"] == "summary"
335
- assert results[-1]["done"] is True
 
2
  Tests for HuggingFace streaming summarizer improvements.
3
  """
4
 
5
+ from unittest.mock import MagicMock, patch
6
 
7
  import pytest
8
 
9
+ from app.services.hf_streaming_summarizer import (
10
+ HFStreamingSummarizer,
11
+ _split_into_chunks,
12
+ )
13
 
14
 
15
  class TestSplitIntoChunks:
 
144
  mock_streamer = MagicMock()
145
  mock_streamer.__iter__ = MagicMock(return_value=iter(["test", "summary"]))
146
 
147
+ with (
148
+ patch(
149
+ "app.services.hf_streaming_summarizer.TextIteratorStreamer",
150
+ return_value=mock_streamer,
151
+ ),
152
+ patch("app.services.hf_streaming_summarizer.settings") as mock_settings,
153
  ):
154
+ mock_settings.hf_model_id = "test-model"
155
+
156
+ results = []
157
+ async for chunk in mock_summarizer._single_chunk_summarize(
158
+ "Test text",
159
+ max_new_tokens=80,
160
+ temperature=0.3,
161
+ top_p=0.9,
162
+ prompt="Test prompt",
163
+ ):
164
+ results.append(chunk)
165
+
166
+ # Should have content chunks + final done
167
+ assert len(results) >= 2
168
+
169
+ # Check that generation was called with correct parameters
170
+ mock_summarizer.model.generate.assert_called_once()
171
+ call_kwargs = mock_summarizer.model.generate.call_args[1]
172
+
173
+ assert call_kwargs["max_new_tokens"] == 80
174
+ assert call_kwargs["temperature"] == 0.3
175
+ assert call_kwargs["top_p"] == 0.9
176
+ assert call_kwargs["length_penalty"] == 1.0 # Should be neutral
177
+ assert call_kwargs["min_new_tokens"] <= 50 # Should be conservative
 
 
 
178
 
179
  @pytest.mark.asyncio
180
  async def test_single_chunk_summarize_defaults(self, mock_summarizer):
 
188
  mock_streamer = MagicMock()
189
  mock_streamer.__iter__ = MagicMock(return_value=iter(["test", "summary"]))
190
 
191
+ with (
192
+ patch(
193
+ "app.services.hf_streaming_summarizer.TextIteratorStreamer",
194
+ return_value=mock_streamer,
195
+ ),
196
+ patch("app.services.hf_streaming_summarizer.settings") as mock_settings,
197
  ):
198
+ mock_settings.hf_model_id = "test-model"
199
+
200
+ results = []
201
+ async for chunk in mock_summarizer._single_chunk_summarize(
202
+ "Test text",
203
+ max_new_tokens=None,
204
+ temperature=None,
205
+ top_p=None,
206
+ prompt="Test prompt",
207
+ ):
208
+ results.append(chunk)
209
+
210
+ # Check that generation was called with correct defaults
211
+ mock_summarizer.model.generate.assert_called_once()
212
+ call_kwargs = mock_summarizer.model.generate.call_args[1]
213
+
214
+ assert call_kwargs["max_new_tokens"] == 80 # Default
215
+ assert call_kwargs["temperature"] == 0.3 # Default
216
+ assert call_kwargs["top_p"] == 0.9 # Default
 
 
 
217
 
218
  @pytest.mark.asyncio
219
  async def test_recursive_summarization_error_handling(self, mock_summarizer):
 
312
  mock_streamer = MagicMock()
313
  mock_streamer.__iter__ = MagicMock(return_value=iter(["short", "summary"]))
314
 
315
+ with (
316
+ patch(
317
+ "app.services.hf_streaming_summarizer.TextIteratorStreamer",
318
+ return_value=mock_streamer,
319
+ ),
320
+ patch("app.services.hf_streaming_summarizer.settings") as mock_settings,
321
  ):
322
+ mock_settings.hf_model_id = "test-model"
323
+ mock_settings.hf_temperature = 0.3
324
+ mock_settings.hf_top_p = 0.9
325
+
326
+ # Short text (<1500 chars)
327
+ short_text = "This is a short text."
328
+
329
+ results = []
330
+ async for chunk in summarizer.summarize_text_stream(short_text):
331
+ results.append(chunk)
332
+
333
+ # Should have used normal flow (not recursive)
334
+ assert len(results) >= 2
335
+ assert results[0]["content"] == "short"
336
+ assert results[1]["content"] == "summary"
337
+ assert results[-1]["done"] is True
 
 
 
tests/test_imports.py ADDED
@@ -0,0 +1,386 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Comprehensive import tests to ensure all dependencies and modules are importable.
3
+
4
+ This test suite validates that:
5
+ 1. All external dependencies from requirements.txt can be imported
6
+ 2. All app modules can be imported without errors
7
+ 3. No circular import issues exist
8
+ 4. All public APIs are accessible
9
+
10
+ Run this test before pushing to catch import errors early.
11
+ """
12
+
13
+ import pytest
14
+
15
+
16
+ class TestExternalDependencies:
17
+ """Test that all external dependencies can be imported."""
18
+
19
+ def test_fastapi_import(self):
20
+ """Test FastAPI can be imported."""
21
+ import fastapi # noqa: F401
22
+
23
+ assert True
24
+
25
+ def test_uvicorn_import(self):
26
+ """Test uvicorn can be imported."""
27
+ import uvicorn # noqa: F401
28
+
29
+ assert True
30
+
31
+ def test_httpx_import(self):
32
+ """Test httpx can be imported."""
33
+ import httpx # noqa: F401
34
+
35
+ assert True
36
+
37
+ def test_pydantic_import(self):
38
+ """Test pydantic can be imported."""
39
+ from pydantic import BaseModel # noqa: F401
40
+
41
+ assert True
42
+
43
+ def test_pydantic_settings_import(self):
44
+ """Test pydantic-settings can be imported."""
45
+ from pydantic_settings import BaseSettings # noqa: F401
46
+
47
+ assert True
48
+
49
+ def test_python_dotenv_import(self):
50
+ """Test python-dotenv can be imported."""
51
+ import dotenv # noqa: F401
52
+
53
+ assert True
54
+
55
+ def test_transformers_import(self):
56
+ """Test transformers can be imported."""
57
+ try:
58
+ from transformers import AutoModelForCausalLM, AutoTokenizer # noqa: F401
59
+
60
+ assert True
61
+ except ImportError:
62
+ pytest.skip("transformers not available (optional)")
63
+
64
+ def test_torch_import(self):
65
+ """Test torch can be imported."""
66
+ try:
67
+ import torch # noqa: F401
68
+
69
+ assert True
70
+ except ImportError:
71
+ pytest.skip("torch not available (optional)")
72
+
73
+ def test_outlines_import(self):
74
+ """Test outlines can be imported."""
75
+ try:
76
+ import outlines # noqa: F401
77
+
78
+ assert True
79
+ except ImportError:
80
+ pytest.skip("outlines not available (optional)")
81
+
82
+ def test_trafilatura_import(self):
83
+ """Test trafilatura can be imported."""
84
+ try:
85
+ import trafilatura # noqa: F401
86
+
87
+ assert True
88
+ except ImportError:
89
+ pytest.skip("trafilatura not available (optional for V3)")
90
+
91
+ def test_lxml_import(self):
92
+ """Test lxml can be imported."""
93
+ try:
94
+ import lxml # noqa: F401
95
+
96
+ assert True
97
+ except ImportError:
98
+ pytest.skip("lxml not available (optional for V3)")
99
+
100
+ def test_ruff_import(self):
101
+ """Test ruff can be imported (development tool)."""
102
+ try:
103
+ import ruff # noqa: F401
104
+
105
+ assert True
106
+ except ImportError:
107
+ pytest.skip("ruff not available (dev dependency)")
108
+
109
+
110
+ class TestCoreModuleImports:
111
+ """Test that all core modules can be imported."""
112
+
113
+ def test_config_import(self):
114
+ """Test core.config can be imported."""
115
+ from app.core.config import Settings, settings # noqa: F401
116
+
117
+ assert True
118
+
119
+ def test_logging_import(self):
120
+ """Test core.logging can be imported."""
121
+ from app.core.logging import get_logger, setup_logging # noqa: F401
122
+
123
+ assert True
124
+
125
+ def test_middleware_import(self):
126
+ """Test core.middleware can be imported."""
127
+ from app.core.middleware import request_context_middleware # noqa: F401
128
+
129
+ assert True
130
+
131
+ def test_errors_import(self):
132
+ """Test core.errors can be imported."""
133
+ from app.core.errors import init_exception_handlers # noqa: F401
134
+
135
+ assert True
136
+
137
+ def test_cache_import(self):
138
+ """Test core.cache can be imported."""
139
+ from app.core.cache import SimpleCache, scraping_cache # noqa: F401
140
+
141
+ assert True
142
+
143
+
144
+ class TestServiceImports:
145
+ """Test that all service modules can be imported."""
146
+
147
+ def test_summarizer_import(self):
148
+ """Test services.summarizer can be imported."""
149
+ from app.services.summarizer import OllamaService, ollama_service # noqa: F401
150
+
151
+ assert True
152
+
153
+ def test_transformers_summarizer_import(self):
154
+ """Test services.transformers_summarizer can be imported."""
155
+ from app.services.transformers_summarizer import ( # noqa: F401
156
+ TransformersService,
157
+ transformers_service,
158
+ )
159
+
160
+ assert True
161
+
162
+ def test_hf_streaming_summarizer_import(self):
163
+ """Test services.hf_streaming_summarizer can be imported."""
164
+ from app.services.hf_streaming_summarizer import ( # noqa: F401
165
+ HFStreamingSummarizer,
166
+ hf_streaming_service,
167
+ )
168
+
169
+ assert True
170
+
171
+ def test_article_scraper_import(self):
172
+ """Test services.article_scraper can be imported."""
173
+ from app.services.article_scraper import ArticleScraperService # noqa: F401
174
+
175
+ assert True
176
+
177
+ def test_structured_summarizer_import(self):
178
+ """Test services.structured_summarizer can be imported."""
179
+ try:
180
+ from app.services.structured_summarizer import ( # noqa: F401
181
+ StructuredSummarizer,
182
+ structured_summarizer_service,
183
+ )
184
+
185
+ assert True
186
+ except ImportError:
187
+ pytest.skip("structured_summarizer dependencies not available")
188
+
189
+
190
+ class TestV1APIImports:
191
+ """Test that V1 API modules can be imported."""
192
+
193
+ def test_v1_routes_import(self):
194
+ """Test api.v1.routes can be imported."""
195
+ from app.api.v1.routes import api_router # noqa: F401
196
+
197
+ assert True
198
+
199
+ def test_v1_schemas_import(self):
200
+ """Test api.v1.schemas can be imported."""
201
+ from app.api.v1.schemas import ( # noqa: F401
202
+ ErrorResponse,
203
+ HealthResponse,
204
+ SummarizeRequest,
205
+ SummarizeResponse,
206
+ )
207
+
208
+ assert True
209
+
210
+ def test_v1_summarize_import(self):
211
+ """Test api.v1.summarize can be imported."""
212
+ from app.api.v1.summarize import summarize_text # noqa: F401
213
+
214
+ assert True
215
+
216
+
217
+ class TestV2APIImports:
218
+ """Test that V2 API modules can be imported."""
219
+
220
+ def test_v2_routes_import(self):
221
+ """Test api.v2.routes can be imported."""
222
+ from app.api.v2.routes import api_router # noqa: F401
223
+
224
+ assert True
225
+
226
+ def test_v2_schemas_import(self):
227
+ """Test api.v2.schemas can be imported."""
228
+ from app.api.v2.schemas import ( # noqa: F401
229
+ ErrorResponse,
230
+ HealthResponse,
231
+ SummarizeRequest,
232
+ SummarizeResponse,
233
+ )
234
+
235
+ assert True
236
+
237
+ def test_v2_summarize_import(self):
238
+ """Test api.v2.summarize can be imported."""
239
+ from app.api.v2.summarize import summarize_text_stream # noqa: F401
240
+
241
+ assert True
242
+
243
+
244
+ class TestV3APIImports:
245
+ """Test that V3 API modules can be imported."""
246
+
247
+ def test_v3_routes_import(self):
248
+ """Test api.v3.routes can be imported."""
249
+ from app.api.v3.routes import api_router # noqa: F401
250
+
251
+ assert True
252
+
253
+ def test_v3_schemas_import(self):
254
+ """Test api.v3.schemas can be imported."""
255
+ from app.api.v3.schemas import ( # noqa: F401
256
+ ErrorResponse,
257
+ HealthResponse,
258
+ ScrapeSummarizeRequest,
259
+ ScrapeSummarizeResponse,
260
+ )
261
+
262
+ assert True
263
+
264
+ def test_v3_scrape_summarize_import(self):
265
+ """Test api.v3.scrape_summarize can be imported."""
266
+ from app.api.v3.scrape_summarize import (
267
+ scrape_and_summarize_stream, # noqa: F401
268
+ )
269
+
270
+ assert True
271
+
272
+
273
+ class TestV4APIImports:
274
+ """Test that V4 API modules can be imported."""
275
+
276
+ def test_v4_routes_import(self):
277
+ """Test api.v4.routes can be imported."""
278
+ try:
279
+ from app.api.v4.routes import api_router # noqa: F401
280
+
281
+ assert True
282
+ except ImportError:
283
+ pytest.skip("V4 API dependencies not available")
284
+
285
+ def test_v4_schemas_import(self):
286
+ """Test api.v4.schemas can be imported."""
287
+ try:
288
+ from app.api.v4.schemas import ( # noqa: F401
289
+ ErrorResponse,
290
+ HealthResponse,
291
+ StructuredSummary,
292
+ StructuredSummaryRequest,
293
+ StructuredSummaryResponse,
294
+ SummarizationStyle,
295
+ )
296
+
297
+ assert True
298
+ except ImportError:
299
+ pytest.skip("V4 API dependencies not available")
300
+
301
+ def test_v4_structured_summary_import(self):
302
+ """Test api.v4.structured_summary can be imported."""
303
+ try:
304
+ from app.api.v4.structured_summary import ( # noqa: F401
305
+ generate_structured_summary_stream,
306
+ )
307
+
308
+ assert True
309
+ except ImportError:
310
+ pytest.skip("V4 API dependencies not available")
311
+
312
+
313
+ class TestMainAppImport:
314
+ """Test that the main app can be imported."""
315
+
316
+ def test_main_app_import(self):
317
+ """Test app.main can be imported."""
318
+ from app.main import app # noqa: F401
319
+
320
+ assert True
321
+
322
+ def test_main_app_has_attributes(self):
323
+ """Test that main app has expected attributes."""
324
+ from app.main import app
325
+
326
+ assert hasattr(app, "title")
327
+ assert hasattr(app, "version")
328
+ assert app.title == "Text Summarizer API"
329
+ assert app.version == "4.0.0"
330
+
331
+
332
+ class TestCircularImports:
333
+ """Test that there are no circular import issues."""
334
+
335
+ def test_repeated_imports(self):
336
+ """Test that modules can be imported multiple times without issues."""
337
+ # Import all major modules twice to catch circular import issues
338
+ import importlib
339
+
340
+ modules_to_test = [
341
+ "app.core.config",
342
+ "app.core.logging",
343
+ "app.core.middleware",
344
+ "app.core.errors",
345
+ "app.services.summarizer",
346
+ "app.services.transformers_summarizer",
347
+ "app.services.hf_streaming_summarizer",
348
+ "app.api.v1.routes",
349
+ "app.api.v2.routes",
350
+ "app.main",
351
+ ]
352
+
353
+ for module_name in modules_to_test:
354
+ # First import
355
+ mod1 = importlib.import_module(module_name)
356
+ # Reload (simulates second import)
357
+ mod2 = importlib.reload(mod1)
358
+ # Should be the same module
359
+ assert mod1 is mod2
360
+
361
+
362
+ class TestRuffMigrationImports:
363
+ """Test that imports still work after ruff migration."""
364
+
365
+ def test_all_app_modules_importable(self):
366
+ """Test that all app modules can be imported after ruff formatting."""
367
+ # This test ensures ruff didn't break any imports
368
+ from app import __version__ # noqa: F401
369
+ from app.core import config, errors, logging, middleware # noqa: F401
370
+ from app.services import ( # noqa: F401
371
+ article_scraper,
372
+ hf_streaming_summarizer,
373
+ summarizer,
374
+ transformers_summarizer,
375
+ )
376
+
377
+ assert True
378
+
379
+ def test_import_statements_formatted(self):
380
+ """Test that import statements are properly formatted by ruff."""
381
+ # This is a meta-test - if imports work, ruff formatting is likely correct
382
+ from app.core.config import settings # noqa: F401
383
+ from app.main import app # noqa: F401
384
+ from app.services.summarizer import ollama_service # noqa: F401
385
+
386
+ assert True
tests/test_logging.py CHANGED
@@ -5,8 +5,6 @@ Tests for logging configuration.
5
  import logging
6
  from unittest.mock import Mock, patch
7
 
8
- import pytest
9
-
10
  from app.core.logging import get_logger, setup_logging
11
 
12
 
 
5
  import logging
6
  from unittest.mock import Mock, patch
7
 
 
 
8
  from app.core.logging import get_logger, setup_logging
9
 
10
 
tests/test_main.py CHANGED
@@ -2,11 +2,6 @@
2
  Tests for main FastAPI application.
3
  """
4
 
5
- import pytest
6
- from fastapi.testclient import TestClient
7
-
8
- from app.main import app
9
-
10
 
11
  class TestMainApp:
12
  """Test main FastAPI application."""
 
2
  Tests for main FastAPI application.
3
  """
4
 
 
 
 
 
 
5
 
6
  class TestMainApp:
7
  """Test main FastAPI application."""
tests/test_middleware.py CHANGED
@@ -110,7 +110,7 @@ class TestRequestContextMiddleware:
110
  return response
111
 
112
  # Test the middleware
113
- result = await request_context_middleware(request, mock_call_next)
114
 
115
  # Verify logging was called
116
  mock_logger.log_request.assert_called_once_with(
 
110
  return response
111
 
112
  # Test the middleware
113
+ await request_context_middleware(request, mock_call_next)
114
 
115
  # Verify logging was called
116
  mock_logger.log_request.assert_called_once_with(
tests/test_schemas.py CHANGED
@@ -5,8 +5,12 @@ Tests for Pydantic schemas.
5
  import pytest
6
  from pydantic import ValidationError
7
 
8
- from app.api.v1.schemas import (ErrorResponse, HealthResponse,
9
- SummarizeRequest, SummarizeResponse)
 
 
 
 
10
 
11
 
12
  class TestSummarizeRequest:
 
5
  import pytest
6
  from pydantic import ValidationError
7
 
8
+ from app.api.v1.schemas import (
9
+ ErrorResponse,
10
+ HealthResponse,
11
+ SummarizeRequest,
12
+ SummarizeResponse,
13
+ )
14
 
15
 
16
  class TestSummarizeRequest:
tests/test_services.py CHANGED
@@ -121,12 +121,16 @@ class TestOllamaService:
121
  @pytest.mark.asyncio
122
  async def test_summarize_text_timeout(self, ollama_service):
123
  """Test timeout handling."""
124
- with patch(
125
- "httpx.AsyncClient",
126
- return_value=StubAsyncClient(post_exc=httpx.TimeoutException("Timeout")),
 
 
 
 
 
127
  ):
128
- with pytest.raises(httpx.TimeoutException):
129
- await ollama_service.summarize_text("Test text")
130
 
131
  @pytest.mark.asyncio
132
  async def test_summarize_text_http_error(self, ollama_service):
@@ -135,11 +139,14 @@ class TestOllamaService:
135
  "Bad Request", request=MagicMock(), response=MagicMock()
136
  )
137
  stub_response = StubAsyncResponse(raise_for_status_exc=http_error)
138
- with patch(
139
- "httpx.AsyncClient", return_value=StubAsyncClient(post_result=stub_response)
 
 
 
 
140
  ):
141
- with pytest.raises(httpx.HTTPError):
142
- await ollama_service.summarize_text("Test text")
143
 
144
  @pytest.mark.asyncio
145
  async def test_check_health_success(self, ollama_service):
@@ -168,7 +175,6 @@ class TestOllamaService:
168
  ):
169
  """Test dynamic timeout calculation for small text (should use base timeout)."""
170
  stub_response = StubAsyncResponse(json_data=mock_ollama_response)
171
- captured_timeout = None
172
 
173
  class TimeoutCaptureClient(StubAsyncClient):
174
  def __init__(self, *args, **kwargs):
@@ -185,7 +191,7 @@ class TestOllamaService:
185
  mock_client.return_value = TimeoutCaptureClient(post_result=stub_response)
186
  mock_client.return_value.timeout = 30 # Test environment base timeout
187
 
188
- result = await ollama_service.summarize_text("Short text")
189
 
190
  # Verify the client was called with the base timeout
191
  mock_client.assert_called_once()
@@ -203,7 +209,7 @@ class TestOllamaService:
203
  with patch("httpx.AsyncClient") as mock_client:
204
  mock_client.return_value = StubAsyncClient(post_result=stub_response)
205
 
206
- result = await ollama_service.summarize_text(large_text)
207
 
208
  # Verify the client was called with extended timeout
209
  # Timeout calculated with ORIGINAL text length (5000 chars): 30 + (5000-1000)/1000 * 3 = 30 + 12 = 42s
@@ -223,7 +229,7 @@ class TestOllamaService:
223
  with patch("httpx.AsyncClient") as mock_client:
224
  mock_client.return_value = StubAsyncClient(post_result=stub_response)
225
 
226
- result = await ollama_service.summarize_text(very_large_text)
227
 
228
  # Verify the timeout is capped at 90 seconds (actual cap)
229
  mock_client.assert_called_once()
@@ -404,11 +410,13 @@ class TestOllamaService:
404
  def stream(self, method, url, **kwargs):
405
  raise httpx.TimeoutException("Timeout")
406
 
407
- with patch("httpx.AsyncClient", return_value=MockStreamClient()):
408
- with pytest.raises(httpx.TimeoutException):
409
- chunks = []
410
- async for chunk in ollama_service.summarize_text_stream("Test text"):
411
- chunks.append(chunk)
 
 
412
 
413
  @pytest.mark.asyncio
414
  async def test_summarize_text_stream_http_error(self, ollama_service):
@@ -427,11 +435,13 @@ class TestOllamaService:
427
  def stream(self, method, url, **kwargs):
428
  raise http_error
429
 
430
- with patch("httpx.AsyncClient", return_value=MockStreamClient()):
431
- with pytest.raises(httpx.HTTPStatusError):
432
- chunks = []
433
- async for chunk in ollama_service.summarize_text_stream("Test text"):
434
- chunks.append(chunk)
 
 
435
 
436
  @pytest.mark.asyncio
437
  async def test_summarize_text_stream_empty_response(self, ollama_service):
 
121
  @pytest.mark.asyncio
122
  async def test_summarize_text_timeout(self, ollama_service):
123
  """Test timeout handling."""
124
+ with (
125
+ patch(
126
+ "httpx.AsyncClient",
127
+ return_value=StubAsyncClient(
128
+ post_exc=httpx.TimeoutException("Timeout")
129
+ ),
130
+ ),
131
+ pytest.raises(httpx.TimeoutException),
132
  ):
133
+ await ollama_service.summarize_text("Test text")
 
134
 
135
  @pytest.mark.asyncio
136
  async def test_summarize_text_http_error(self, ollama_service):
 
139
  "Bad Request", request=MagicMock(), response=MagicMock()
140
  )
141
  stub_response = StubAsyncResponse(raise_for_status_exc=http_error)
142
+ with (
143
+ patch(
144
+ "httpx.AsyncClient",
145
+ return_value=StubAsyncClient(post_result=stub_response),
146
+ ),
147
+ pytest.raises(httpx.HTTPError),
148
  ):
149
+ await ollama_service.summarize_text("Test text")
 
150
 
151
  @pytest.mark.asyncio
152
  async def test_check_health_success(self, ollama_service):
 
175
  ):
176
  """Test dynamic timeout calculation for small text (should use base timeout)."""
177
  stub_response = StubAsyncResponse(json_data=mock_ollama_response)
 
178
 
179
  class TimeoutCaptureClient(StubAsyncClient):
180
  def __init__(self, *args, **kwargs):
 
191
  mock_client.return_value = TimeoutCaptureClient(post_result=stub_response)
192
  mock_client.return_value.timeout = 30 # Test environment base timeout
193
 
194
+ await ollama_service.summarize_text("Short text")
195
 
196
  # Verify the client was called with the base timeout
197
  mock_client.assert_called_once()
 
209
  with patch("httpx.AsyncClient") as mock_client:
210
  mock_client.return_value = StubAsyncClient(post_result=stub_response)
211
 
212
+ await ollama_service.summarize_text(large_text)
213
 
214
  # Verify the client was called with extended timeout
215
  # Timeout calculated with ORIGINAL text length (5000 chars): 30 + (5000-1000)/1000 * 3 = 30 + 12 = 42s
 
229
  with patch("httpx.AsyncClient") as mock_client:
230
  mock_client.return_value = StubAsyncClient(post_result=stub_response)
231
 
232
+ await ollama_service.summarize_text(very_large_text)
233
 
234
  # Verify the timeout is capped at 90 seconds (actual cap)
235
  mock_client.assert_called_once()
 
410
  def stream(self, method, url, **kwargs):
411
  raise httpx.TimeoutException("Timeout")
412
 
413
+ with (
414
+ patch("httpx.AsyncClient", return_value=MockStreamClient()),
415
+ pytest.raises(httpx.TimeoutException),
416
+ ):
417
+ chunks = []
418
+ async for chunk in ollama_service.summarize_text_stream("Test text"):
419
+ chunks.append(chunk)
420
 
421
  @pytest.mark.asyncio
422
  async def test_summarize_text_stream_http_error(self, ollama_service):
 
435
  def stream(self, method, url, **kwargs):
436
  raise http_error
437
 
438
+ with (
439
+ patch("httpx.AsyncClient", return_value=MockStreamClient()),
440
+ pytest.raises(httpx.HTTPStatusError),
441
+ ):
442
+ chunks = []
443
+ async for chunk in ollama_service.summarize_text_stream("Test text"):
444
+ chunks.append(chunk)
445
 
446
  @pytest.mark.asyncio
447
  async def test_summarize_text_stream_empty_response(self, ollama_service):
tests/test_startup_script.py CHANGED
@@ -4,12 +4,9 @@ Tests for the startup script functionality.
4
 
5
  import os
6
  import shutil
7
- import subprocess
8
  import tempfile
9
  from unittest.mock import MagicMock, patch
10
 
11
- import pytest
12
-
13
 
14
  class TestStartupScript:
15
  """Test the start-server.sh script functionality."""
@@ -49,7 +46,7 @@ class TestStartupScript:
49
 
50
  # We can't actually run the script in tests due to uvicorn, but we can test the logic
51
  # by checking if the .env creation logic is present in the script
52
- with open(script_path, "r") as f:
53
  script_content = f.read()
54
 
55
  assert "if [ ! -f .env ]" in script_content
@@ -60,7 +57,7 @@ class TestStartupScript:
60
  """Test that script includes Ollama service health check."""
61
  script_path = os.path.join(self.original_cwd, "start-server.sh")
62
 
63
- with open(script_path, "r") as f:
64
  script_content = f.read()
65
 
66
  assert "curl -s http://127.0.0.1:11434/api/tags" in script_content
@@ -70,7 +67,7 @@ class TestStartupScript:
70
  """Test that script checks for model availability."""
71
  script_path = os.path.join(self.original_cwd, "start-server.sh")
72
 
73
- with open(script_path, "r") as f:
74
  script_content = f.read()
75
 
76
  assert "Model" in script_content
@@ -80,7 +77,7 @@ class TestStartupScript:
80
  """Test that script includes process cleanup logic."""
81
  script_path = os.path.join(self.original_cwd, "start-server.sh")
82
 
83
- with open(script_path, "r") as f:
84
  script_content = f.read()
85
 
86
  # Check for multiple process killing methods
@@ -93,7 +90,7 @@ class TestStartupScript:
93
  """Test that script verifies port is free after cleanup."""
94
  script_path = os.path.join(self.original_cwd, "start-server.sh")
95
 
96
- with open(script_path, "r") as f:
97
  script_content = f.read()
98
 
99
  assert "Port" in script_content
@@ -104,7 +101,7 @@ class TestStartupScript:
104
  """Test that script starts uvicorn with correct parameters."""
105
  script_path = os.path.join(self.original_cwd, "start-server.sh")
106
 
107
- with open(script_path, "r") as f:
108
  script_content = f.read()
109
 
110
  assert "uvicorn app.main:app" in script_content
@@ -116,7 +113,7 @@ class TestStartupScript:
116
  """Test that script provides helpful user feedback."""
117
  script_path = os.path.join(self.original_cwd, "start-server.sh")
118
 
119
- with open(script_path, "r") as f:
120
  script_content = f.read()
121
 
122
  # Check for emoji and helpful messages
@@ -132,7 +129,7 @@ class TestStartupScript:
132
  """Test that script handles Ollama not running gracefully."""
133
  script_path = os.path.join(self.original_cwd, "start-server.sh")
134
 
135
- with open(script_path, "r") as f:
136
  script_content = f.read()
137
 
138
  assert "Ollama is not running" in script_content
@@ -143,7 +140,7 @@ class TestStartupScript:
143
  """Test that script handles model not available gracefully."""
144
  script_path = os.path.join(self.original_cwd, "start-server.sh")
145
 
146
- with open(script_path, "r") as f:
147
  script_content = f.read()
148
 
149
  assert "Model" in script_content
 
4
 
5
  import os
6
  import shutil
 
7
  import tempfile
8
  from unittest.mock import MagicMock, patch
9
 
 
 
10
 
11
  class TestStartupScript:
12
  """Test the start-server.sh script functionality."""
 
46
 
47
  # We can't actually run the script in tests due to uvicorn, but we can test the logic
48
  # by checking if the .env creation logic is present in the script
49
+ with open(script_path) as f:
50
  script_content = f.read()
51
 
52
  assert "if [ ! -f .env ]" in script_content
 
57
  """Test that script includes Ollama service health check."""
58
  script_path = os.path.join(self.original_cwd, "start-server.sh")
59
 
60
+ with open(script_path) as f:
61
  script_content = f.read()
62
 
63
  assert "curl -s http://127.0.0.1:11434/api/tags" in script_content
 
67
  """Test that script checks for model availability."""
68
  script_path = os.path.join(self.original_cwd, "start-server.sh")
69
 
70
+ with open(script_path) as f:
71
  script_content = f.read()
72
 
73
  assert "Model" in script_content
 
77
  """Test that script includes process cleanup logic."""
78
  script_path = os.path.join(self.original_cwd, "start-server.sh")
79
 
80
+ with open(script_path) as f:
81
  script_content = f.read()
82
 
83
  # Check for multiple process killing methods
 
90
  """Test that script verifies port is free after cleanup."""
91
  script_path = os.path.join(self.original_cwd, "start-server.sh")
92
 
93
+ with open(script_path) as f:
94
  script_content = f.read()
95
 
96
  assert "Port" in script_content
 
101
  """Test that script starts uvicorn with correct parameters."""
102
  script_path = os.path.join(self.original_cwd, "start-server.sh")
103
 
104
+ with open(script_path) as f:
105
  script_content = f.read()
106
 
107
  assert "uvicorn app.main:app" in script_content
 
113
  """Test that script provides helpful user feedback."""
114
  script_path = os.path.join(self.original_cwd, "start-server.sh")
115
 
116
+ with open(script_path) as f:
117
  script_content = f.read()
118
 
119
  # Check for emoji and helpful messages
 
129
  """Test that script handles Ollama not running gracefully."""
130
  script_path = os.path.join(self.original_cwd, "start-server.sh")
131
 
132
+ with open(script_path) as f:
133
  script_content = f.read()
134
 
135
  assert "Ollama is not running" in script_content
 
140
  """Test that script handles model not available gracefully."""
141
  script_path = os.path.join(self.original_cwd, "start-server.sh")
142
 
143
+ with open(script_path) as f:
144
  script_content = f.read()
145
 
146
  assert "Model" in script_content
tests/test_timeout_optimization.py CHANGED
@@ -6,14 +6,9 @@ the issue of excessive timeout values (100+ seconds) by implementing
6
  more reasonable timeout calculations.
7
  """
8
 
9
- from unittest.mock import MagicMock, patch
10
-
11
- import httpx
12
- import pytest
13
- from fastapi.testclient import TestClient
14
 
15
  from app.core.config import Settings
16
- from app.main import app
17
  from app.services.summarizer import OllamaService
18
 
19
 
@@ -27,9 +22,9 @@ class TestTimeoutOptimization:
27
  settings = Settings()
28
  # The actual default in the code is 60, but .env file overrides it to 30
29
  # This test verifies the code default is correct
30
- assert (
31
- settings.ollama_timeout == 30
32
- ), "Current .env timeout should be 30 seconds"
33
 
34
  def test_timeout_optimization_formula_improvement(self):
35
  """Test that the timeout optimization formula provides better values."""
@@ -59,9 +54,9 @@ class TestTimeoutOptimization:
59
  )
60
  dynamic_timeout = min(dynamic_timeout, max_cap)
61
 
62
- assert (
63
- dynamic_timeout == expected_timeout
64
- ), f"Text length {text_length} should have timeout {expected_timeout}, got {dynamic_timeout}"
65
 
66
  def test_timeout_scaling_factor_optimization(self):
67
  """Test that the scaling factor is optimized from +10s to +5s per 1000 chars."""
@@ -75,9 +70,9 @@ class TestTimeoutOptimization:
75
  )
76
 
77
  # Should be 60 + 1*5 = 65 seconds (not 60 + 1*10 = 70)
78
- assert (
79
- dynamic_timeout == 65
80
- ), f"Scaling factor should be +5s per 1000 chars, got {dynamic_timeout - 60}"
81
 
82
  def test_maximum_timeout_cap_optimization(self):
83
  """Test that the maximum timeout cap is optimized from 300s to 120s."""
@@ -93,15 +88,15 @@ class TestTimeoutOptimization:
93
  )
94
 
95
  # Should be much higher than 90 without cap
96
- assert (
97
- uncapped_timeout > 90
98
- ), f"Uncapped timeout should be > 90s, got {uncapped_timeout}"
99
 
100
  # With cap, should be exactly 90
101
  capped_timeout = min(uncapped_timeout, max_cap)
102
- assert (
103
- capped_timeout == 90
104
- ), f"Capped timeout should be 90s, got {capped_timeout}"
105
 
106
  def test_timeout_optimization_prevents_excessive_waits(self):
107
  """Test that optimized timeouts prevent excessive waits like 100+ seconds."""
@@ -119,16 +114,16 @@ class TestTimeoutOptimization:
119
  dynamic_timeout = min(dynamic_timeout, max_cap)
120
 
121
  # No timeout should exceed 90 seconds (actual cap)
122
- assert (
123
- dynamic_timeout <= 90
124
- ), f"Timeout for {text_length} chars should not exceed 90s, got {dynamic_timeout}"
125
 
126
  # No timeout should be excessively long (like 100+ seconds for typical text)
127
  if text_length <= 20000: # Typical text sizes
128
  # Allow up to 90 seconds for 20k chars (which is reasonable and capped)
129
- assert (
130
- dynamic_timeout <= 90
131
- ), f"Timeout for typical text size {text_length} should not exceed 90s, got {dynamic_timeout}"
132
 
133
  def test_timeout_optimization_performance_improvement(self):
134
  """Test that timeout optimization provides better performance characteristics."""
@@ -154,15 +149,15 @@ class TestTimeoutOptimization:
154
  new_timeout = min(new_timeout, new_cap) # Capped at 90
155
 
156
  # New timeout should be significantly better
157
- assert (
158
- new_timeout < old_timeout
159
- ), f"New timeout {new_timeout}s should be less than old {old_timeout}s"
160
- assert (
161
- new_timeout == 90
162
- ), f"New timeout should be 90s for 10k chars (capped), got {new_timeout}"
163
- assert (
164
- old_timeout == 210
165
- ), f"Old timeout should be 210s for 10k chars, got {old_timeout}"
166
 
167
  def test_timeout_optimization_edge_cases(self):
168
  """Test timeout optimization with edge cases."""
@@ -186,9 +181,9 @@ class TestTimeoutOptimization:
186
  )
187
  dynamic_timeout = min(dynamic_timeout, max_cap)
188
 
189
- assert (
190
- dynamic_timeout == expected_timeout
191
- ), f"Edge case {text_length} chars should have timeout {expected_timeout}, got {dynamic_timeout}"
192
 
193
  def test_timeout_optimization_prevents_100_second_issue(self):
194
  """Test that timeout optimization specifically prevents the 100+ second issue."""
@@ -206,23 +201,23 @@ class TestTimeoutOptimization:
206
 
207
  # Should be 30 + (19000//1000)*3 = 30 + 19*3 = 87, capped at 90
208
  expected_timeout = 87 # Not capped
209
- assert (
210
- dynamic_timeout == expected_timeout
211
- ), f"Problematic text length should have timeout {expected_timeout}s, got {dynamic_timeout}"
212
 
213
  # Should not be 100+ seconds
214
- assert (
215
- dynamic_timeout <= 90
216
- ), f"Optimized timeout should not exceed 90s, got {dynamic_timeout}"
217
 
218
  # Should be much better than the old calculation
219
  old_timeout = 120 + max(
220
  0, (problematic_text_length - 1000) // 1000 * 10
221
  ) # 120 + 19*10 = 310
222
  old_timeout = min(old_timeout, 300) # Capped at 300
223
- assert (
224
- dynamic_timeout < old_timeout
225
- ), f"Optimized timeout {dynamic_timeout}s should be much better than old {old_timeout}s"
226
 
227
  def test_timeout_optimization_configuration_values(self):
228
  """Test that the timeout optimization configuration values are correct."""
@@ -231,13 +226,13 @@ class TestTimeoutOptimization:
231
  settings = Settings()
232
 
233
  # The current .env file has 30 seconds, but the code default is 60
234
- assert (
235
- settings.ollama_timeout == 30
236
- ), f"Current .env timeout should be 30s, got {settings.ollama_timeout}"
237
 
238
  # Test that the service uses the same timeout (test environment uses 30)
239
  service = OllamaService()
240
  # The service should use the test environment timeout of 30
241
- assert (
242
- service.timeout == 30
243
- ), f"Service timeout should be 30s (test environment), got {service.timeout}"
 
6
  more reasonable timeout calculations.
7
  """
8
 
9
+ from unittest.mock import patch
 
 
 
 
10
 
11
  from app.core.config import Settings
 
12
  from app.services.summarizer import OllamaService
13
 
14
 
 
22
  settings = Settings()
23
  # The actual default in the code is 60, but .env file overrides it to 30
24
  # This test verifies the code default is correct
25
+ assert settings.ollama_timeout == 30, (
26
+ "Current .env timeout should be 30 seconds"
27
+ )
28
 
29
  def test_timeout_optimization_formula_improvement(self):
30
  """Test that the timeout optimization formula provides better values."""
 
54
  )
55
  dynamic_timeout = min(dynamic_timeout, max_cap)
56
 
57
+ assert dynamic_timeout == expected_timeout, (
58
+ f"Text length {text_length} should have timeout {expected_timeout}, got {dynamic_timeout}"
59
+ )
60
 
61
  def test_timeout_scaling_factor_optimization(self):
62
  """Test that the scaling factor is optimized from +10s to +5s per 1000 chars."""
 
70
  )
71
 
72
  # Should be 60 + 1*5 = 65 seconds (not 60 + 1*10 = 70)
73
+ assert dynamic_timeout == 65, (
74
+ f"Scaling factor should be +5s per 1000 chars, got {dynamic_timeout - 60}"
75
+ )
76
 
77
  def test_maximum_timeout_cap_optimization(self):
78
  """Test that the maximum timeout cap is optimized from 300s to 120s."""
 
88
  )
89
 
90
  # Should be much higher than 90 without cap
91
+ assert uncapped_timeout > 90, (
92
+ f"Uncapped timeout should be > 90s, got {uncapped_timeout}"
93
+ )
94
 
95
  # With cap, should be exactly 90
96
  capped_timeout = min(uncapped_timeout, max_cap)
97
+ assert capped_timeout == 90, (
98
+ f"Capped timeout should be 90s, got {capped_timeout}"
99
+ )
100
 
101
  def test_timeout_optimization_prevents_excessive_waits(self):
102
  """Test that optimized timeouts prevent excessive waits like 100+ seconds."""
 
114
  dynamic_timeout = min(dynamic_timeout, max_cap)
115
 
116
  # No timeout should exceed 90 seconds (actual cap)
117
+ assert dynamic_timeout <= 90, (
118
+ f"Timeout for {text_length} chars should not exceed 90s, got {dynamic_timeout}"
119
+ )
120
 
121
  # No timeout should be excessively long (like 100+ seconds for typical text)
122
  if text_length <= 20000: # Typical text sizes
123
  # Allow up to 90 seconds for 20k chars (which is reasonable and capped)
124
+ assert dynamic_timeout <= 90, (
125
+ f"Timeout for typical text size {text_length} should not exceed 90s, got {dynamic_timeout}"
126
+ )
127
 
128
  def test_timeout_optimization_performance_improvement(self):
129
  """Test that timeout optimization provides better performance characteristics."""
 
149
  new_timeout = min(new_timeout, new_cap) # Capped at 90
150
 
151
  # New timeout should be significantly better
152
+ assert new_timeout < old_timeout, (
153
+ f"New timeout {new_timeout}s should be less than old {old_timeout}s"
154
+ )
155
+ assert new_timeout == 90, (
156
+ f"New timeout should be 90s for 10k chars (capped), got {new_timeout}"
157
+ )
158
+ assert old_timeout == 210, (
159
+ f"Old timeout should be 210s for 10k chars, got {old_timeout}"
160
+ )
161
 
162
  def test_timeout_optimization_edge_cases(self):
163
  """Test timeout optimization with edge cases."""
 
181
  )
182
  dynamic_timeout = min(dynamic_timeout, max_cap)
183
 
184
+ assert dynamic_timeout == expected_timeout, (
185
+ f"Edge case {text_length} chars should have timeout {expected_timeout}, got {dynamic_timeout}"
186
+ )
187
 
188
  def test_timeout_optimization_prevents_100_second_issue(self):
189
  """Test that timeout optimization specifically prevents the 100+ second issue."""
 
201
 
202
  # Should be 30 + (19000//1000)*3 = 30 + 19*3 = 87, capped at 90
203
  expected_timeout = 87 # Not capped
204
+ assert dynamic_timeout == expected_timeout, (
205
+ f"Problematic text length should have timeout {expected_timeout}s, got {dynamic_timeout}"
206
+ )
207
 
208
  # Should not be 100+ seconds
209
+ assert dynamic_timeout <= 90, (
210
+ f"Optimized timeout should not exceed 90s, got {dynamic_timeout}"
211
+ )
212
 
213
  # Should be much better than the old calculation
214
  old_timeout = 120 + max(
215
  0, (problematic_text_length - 1000) // 1000 * 10
216
  ) # 120 + 19*10 = 310
217
  old_timeout = min(old_timeout, 300) # Capped at 300
218
+ assert dynamic_timeout < old_timeout, (
219
+ f"Optimized timeout {dynamic_timeout}s should be much better than old {old_timeout}s"
220
+ )
221
 
222
  def test_timeout_optimization_configuration_values(self):
223
  """Test that the timeout optimization configuration values are correct."""
 
226
  settings = Settings()
227
 
228
  # The current .env file has 30 seconds, but the code default is 60
229
+ assert settings.ollama_timeout == 30, (
230
+ f"Current .env timeout should be 30s, got {settings.ollama_timeout}"
231
+ )
232
 
233
  # Test that the service uses the same timeout (test environment uses 30)
234
  service = OllamaService()
235
  # The service should use the test environment timeout of 30
236
+ assert service.timeout == 30, (
237
+ f"Service timeout should be 30s (test environment), got {service.timeout}"
238
+ )
tests/test_v2_api.py CHANGED
@@ -3,13 +3,11 @@ Tests for V2 API endpoints.
3
  """
4
 
5
  import json
6
- from unittest.mock import AsyncMock, MagicMock, patch
7
 
8
  import pytest
9
  from fastapi.testclient import TestClient
10
 
11
- from app.main import app
12
-
13
 
14
  class TestV2SummarizeStream:
15
  """Test V2 streaming summarization endpoint."""
 
3
  """
4
 
5
  import json
6
+ from unittest.mock import patch
7
 
8
  import pytest
9
  from fastapi.testclient import TestClient
10
 
 
 
11
 
12
  class TestV2SummarizeStream:
13
  """Test V2 streaming summarization endpoint."""
tests/test_v3_api.py CHANGED
@@ -2,10 +2,10 @@
2
  Tests for V3 API endpoints.
3
  """
4
 
 
5
  import json
6
  from unittest.mock import patch
7
 
8
- import pytest
9
  from fastapi.testclient import TestClient
10
 
11
  from app.main import app
@@ -40,7 +40,6 @@ def test_scrape_and_summarize_stream_success(client: TestClient):
40
  "app.services.hf_streaming_summarizer.hf_streaming_service.summarize_text_stream",
41
  side_effect=mock_stream,
42
  ):
43
-
44
  response = client.post(
45
  "/api/v3/scrape-and-summarize/stream",
46
  json={
@@ -59,10 +58,8 @@ def test_scrape_and_summarize_stream_success(client: TestClient):
59
  events = []
60
  for line in response.text.split("\n"):
61
  if line.startswith("data: "):
62
- try:
63
  events.append(json.loads(line[6:]))
64
- except json.JSONDecodeError:
65
- pass
66
 
67
  assert len(events) > 0
68
 
@@ -81,7 +78,7 @@ def test_scrape_and_summarize_stream_success(client: TestClient):
81
  assert len(content_events) >= 3
82
 
83
  # Check done event
84
- done_events = [e for e in events if e.get("done") == True]
85
  assert len(done_events) == 1
86
 
87
 
@@ -176,7 +173,6 @@ def test_scrape_without_metadata(client: TestClient):
176
  "app.services.hf_streaming_summarizer.hf_streaming_service.summarize_text_stream",
177
  side_effect=mock_stream,
178
  ):
179
-
180
  response = client.post(
181
  "/api/v3/scrape-and-summarize/stream",
182
  json={"url": "https://example.com/test", "include_metadata": False},
@@ -188,10 +184,8 @@ def test_scrape_without_metadata(client: TestClient):
188
  events = []
189
  for line in response.text.split("\n"):
190
  if line.startswith("data: "):
191
- try:
192
  events.append(json.loads(line[6:]))
193
- except json.JSONDecodeError:
194
- pass
195
 
196
  # Should not have metadata event
197
  metadata_events = [e for e in events if e.get("type") == "metadata"]
@@ -225,7 +219,6 @@ def test_scrape_with_cache(client: TestClient):
225
  "app.services.hf_streaming_summarizer.hf_streaming_service.summarize_text_stream",
226
  side_effect=mock_stream,
227
  ):
228
-
229
  # First request - should call scraper
230
  response1 = client.post(
231
  "/api/v3/scrape-and-summarize/stream",
 
2
  Tests for V3 API endpoints.
3
  """
4
 
5
+ import contextlib
6
  import json
7
  from unittest.mock import patch
8
 
 
9
  from fastapi.testclient import TestClient
10
 
11
  from app.main import app
 
40
  "app.services.hf_streaming_summarizer.hf_streaming_service.summarize_text_stream",
41
  side_effect=mock_stream,
42
  ):
 
43
  response = client.post(
44
  "/api/v3/scrape-and-summarize/stream",
45
  json={
 
58
  events = []
59
  for line in response.text.split("\n"):
60
  if line.startswith("data: "):
61
+ with contextlib.suppress(json.JSONDecodeError):
62
  events.append(json.loads(line[6:]))
 
 
63
 
64
  assert len(events) > 0
65
 
 
78
  assert len(content_events) >= 3
79
 
80
  # Check done event
81
+ done_events = [e for e in events if e.get("done")]
82
  assert len(done_events) == 1
83
 
84
 
 
173
  "app.services.hf_streaming_summarizer.hf_streaming_service.summarize_text_stream",
174
  side_effect=mock_stream,
175
  ):
 
176
  response = client.post(
177
  "/api/v3/scrape-and-summarize/stream",
178
  json={"url": "https://example.com/test", "include_metadata": False},
 
184
  events = []
185
  for line in response.text.split("\n"):
186
  if line.startswith("data: "):
187
+ with contextlib.suppress(json.JSONDecodeError):
188
  events.append(json.loads(line[6:]))
 
 
189
 
190
  # Should not have metadata event
191
  metadata_events = [e for e in events if e.get("type") == "metadata"]
 
219
  "app.services.hf_streaming_summarizer.hf_streaming_service.summarize_text_stream",
220
  side_effect=mock_stream,
221
  ):
 
222
  # First request - should call scraper
223
  response1 = client.post(
224
  "/api/v3/scrape-and-summarize/stream",
tests/test_v4_api.py CHANGED
@@ -2,6 +2,7 @@
2
  Tests for V4 Structured Summarization API endpoints.
3
  """
4
 
 
5
  import json
6
  from unittest.mock import patch
7
 
@@ -59,7 +60,6 @@ def test_v4_scrape_and_summarize_stream_success(client: TestClient):
59
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream",
60
  side_effect=mock_stream,
61
  ):
62
-
63
  response = client.post(
64
  "/api/v4/scrape-and-summarize/stream",
65
  json={
@@ -79,10 +79,8 @@ def test_v4_scrape_and_summarize_stream_success(client: TestClient):
79
  events = []
80
  for line in response.text.split("\n"):
81
  if line.startswith("data: "):
82
- try:
83
  events.append(json.loads(line[6:]))
84
- except json.JSONDecodeError:
85
- pass
86
 
87
  assert len(events) > 0
88
 
@@ -107,6 +105,7 @@ def test_v4_scrape_and_summarize_stream_success(client: TestClient):
107
 
108
  def test_v4_text_mode_success(client: TestClient):
109
  """Test V4 with direct text input (no scraping)."""
 
110
  async def mock_stream(*args, **kwargs):
111
  yield {
112
  "content": '{"title": "Summary", "main_summary": "Test"}',
@@ -119,7 +118,6 @@ def test_v4_text_mode_success(client: TestClient):
119
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream",
120
  side_effect=mock_stream,
121
  ):
122
-
123
  response = client.post(
124
  "/api/v4/scrape-and-summarize/stream",
125
  json={
@@ -135,10 +133,8 @@ def test_v4_text_mode_success(client: TestClient):
135
  events = []
136
  for line in response.text.split("\n"):
137
  if line.startswith("data: "):
138
- try:
139
  events.append(json.loads(line[6:]))
140
- except json.JSONDecodeError:
141
- pass
142
 
143
  # Check metadata event for text mode
144
  metadata_events = [e for e in events if e.get("type") == "metadata"]
@@ -325,15 +321,19 @@ def test_v4_text_length_validation(client: TestClient):
325
  @pytest.mark.asyncio
326
  async def test_v4_sse_headers(client: TestClient):
327
  """Test V4 SSE response headers."""
 
328
  async def mock_stream(*args, **kwargs):
329
  yield {"content": "test", "done": False, "tokens_used": 1}
330
  yield {"content": "", "done": True, "latency_ms": 1000.0}
331
 
332
- with patch(
333
- "app.services.article_scraper.article_scraper_service.scrape_article"
334
- ) as mock_scrape, patch(
335
- "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream",
336
- side_effect=mock_stream,
 
 
 
337
  ):
338
  mock_scrape.return_value = {
339
  "text": "Test article content. " * 20,
@@ -368,7 +368,8 @@ def test_v4_stream_json_url_mode_success(client: TestClient):
368
  mock_scrape.return_value = {
369
  "text": "Artificial intelligence is transforming modern technology. "
370
  "Machine learning algorithms are becoming more sophisticated. "
371
- "Deep learning models can now process vast amounts of data efficiently." * 10,
 
372
  "title": "AI Revolution 2024",
373
  "author": "Dr. Jane Smith",
374
  "date": "2024-11-30",
@@ -382,20 +383,20 @@ def test_v4_stream_json_url_mode_success(client: TestClient):
382
  async def mock_json_stream(*args, **kwargs):
383
  # Yield raw JSON token fragments (simulating Outlines output)
384
  yield '{"title": "'
385
- yield 'AI Revolution'
386
  yield '", "main_summary": "'
387
- yield 'Artificial intelligence is rapidly evolving'
388
  yield '", "key_points": ['
389
  yield '"AI is transforming technology"'
390
  yield ', "ML algorithms are improving"'
391
  yield ', "Deep learning processes data efficiently"'
392
  yield '], "category": "'
393
- yield 'Technology'
394
  yield '", "sentiment": "'
395
- yield 'positive'
396
  yield '", "read_time_min": '
397
- yield '3'
398
- yield '}'
399
 
400
  with patch(
401
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream_json",
@@ -500,6 +501,7 @@ def test_v4_stream_json_text_mode_success(client: TestClient):
500
 
501
  def test_v4_stream_json_no_metadata(client: TestClient):
502
  """Test stream-json endpoint with include_metadata=false."""
 
503
  async def mock_json_stream(*args, **kwargs):
504
  yield '{"title": "Test", '
505
  yield '"main_summary": "Summary", '
@@ -515,7 +517,8 @@ def test_v4_stream_json_no_metadata(client: TestClient):
515
  response = client.post(
516
  "/api/v4/scrape-and-summarize/stream-json",
517
  json={
518
- "text": "Test article content for summary generation with enough characters to pass validation." * 2,
 
519
  "style": "eli5",
520
  "include_metadata": False,
521
  },
@@ -534,7 +537,9 @@ def test_v4_stream_json_no_metadata(client: TestClient):
534
  if events and events[0]:
535
  try:
536
  first_event = json.loads(events[0])
537
- assert first_event.get("type") != "metadata", "Metadata should not be included"
 
 
538
  except json.JSONDecodeError:
539
  # First event is not complete JSON, so it's raw tokens (good!)
540
  pass
@@ -550,22 +555,27 @@ def test_v4_stream_json_different_styles(client: TestClient):
550
  styles_to_test = ["skimmer", "executive", "eli5"]
551
 
552
  for style in styles_to_test:
553
- async def mock_json_stream(*args, **kwargs):
554
- yield f'{{"title": "{style.upper()}", '
555
- yield '"main_summary": "Test", '
556
- yield '"key_points": ["A"], '
557
- yield '"category": "Test", '
558
- yield '"sentiment": "positive", '
559
- yield '"read_time_min": 1}'
 
 
 
 
560
 
561
  with patch(
562
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream_json",
563
- side_effect=mock_json_stream,
564
  ):
565
  response = client.post(
566
  "/api/v4/scrape-and-summarize/stream-json",
567
  json={
568
- "text": "Test content for different styles with sufficient character count to pass validation requirements." * 2,
 
569
  "style": style,
570
  "include_metadata": False,
571
  },
@@ -576,6 +586,7 @@ def test_v4_stream_json_different_styles(client: TestClient):
576
 
577
  def test_v4_stream_json_custom_max_tokens(client: TestClient):
578
  """Test stream-json endpoint with custom max_tokens parameter."""
 
579
  async def mock_json_stream(text, style, max_tokens=None):
580
  # Verify max_tokens is passed through
581
  assert max_tokens == 1536
@@ -593,7 +604,8 @@ def test_v4_stream_json_custom_max_tokens(client: TestClient):
593
  response = client.post(
594
  "/api/v4/scrape-and-summarize/stream-json",
595
  json={
596
- "text": "Test content with custom max tokens that meets minimum character requirements." * 3,
 
597
  "style": "executive",
598
  "max_tokens": 1536,
599
  "include_metadata": False,
@@ -715,6 +727,7 @@ def test_v4_stream_json_validation_errors(client: TestClient):
715
 
716
  def test_v4_stream_json_response_headers(client: TestClient):
717
  """Test stream-json endpoint returns correct SSE headers."""
 
718
  async def mock_json_stream(*args, **kwargs):
719
  yield '{"title": "Test", "main_summary": "Test", "key_points": [], '
720
  yield '"category": "Test", "sentiment": "neutral", "read_time_min": 1}'
 
2
  Tests for V4 Structured Summarization API endpoints.
3
  """
4
 
5
+ import contextlib
6
  import json
7
  from unittest.mock import patch
8
 
 
60
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream",
61
  side_effect=mock_stream,
62
  ):
 
63
  response = client.post(
64
  "/api/v4/scrape-and-summarize/stream",
65
  json={
 
79
  events = []
80
  for line in response.text.split("\n"):
81
  if line.startswith("data: "):
82
+ with contextlib.suppress(json.JSONDecodeError):
83
  events.append(json.loads(line[6:]))
 
 
84
 
85
  assert len(events) > 0
86
 
 
105
 
106
  def test_v4_text_mode_success(client: TestClient):
107
  """Test V4 with direct text input (no scraping)."""
108
+
109
  async def mock_stream(*args, **kwargs):
110
  yield {
111
  "content": '{"title": "Summary", "main_summary": "Test"}',
 
118
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream",
119
  side_effect=mock_stream,
120
  ):
 
121
  response = client.post(
122
  "/api/v4/scrape-and-summarize/stream",
123
  json={
 
133
  events = []
134
  for line in response.text.split("\n"):
135
  if line.startswith("data: "):
136
+ with contextlib.suppress(json.JSONDecodeError):
137
  events.append(json.loads(line[6:]))
 
 
138
 
139
  # Check metadata event for text mode
140
  metadata_events = [e for e in events if e.get("type") == "metadata"]
 
321
  @pytest.mark.asyncio
322
  async def test_v4_sse_headers(client: TestClient):
323
  """Test V4 SSE response headers."""
324
+
325
  async def mock_stream(*args, **kwargs):
326
  yield {"content": "test", "done": False, "tokens_used": 1}
327
  yield {"content": "", "done": True, "latency_ms": 1000.0}
328
 
329
+ with (
330
+ patch(
331
+ "app.services.article_scraper.article_scraper_service.scrape_article"
332
+ ) as mock_scrape,
333
+ patch(
334
+ "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream",
335
+ side_effect=mock_stream,
336
+ ),
337
  ):
338
  mock_scrape.return_value = {
339
  "text": "Test article content. " * 20,
 
368
  mock_scrape.return_value = {
369
  "text": "Artificial intelligence is transforming modern technology. "
370
  "Machine learning algorithms are becoming more sophisticated. "
371
+ "Deep learning models can now process vast amounts of data efficiently."
372
+ * 10,
373
  "title": "AI Revolution 2024",
374
  "author": "Dr. Jane Smith",
375
  "date": "2024-11-30",
 
383
  async def mock_json_stream(*args, **kwargs):
384
  # Yield raw JSON token fragments (simulating Outlines output)
385
  yield '{"title": "'
386
+ yield "AI Revolution"
387
  yield '", "main_summary": "'
388
+ yield "Artificial intelligence is rapidly evolving"
389
  yield '", "key_points": ['
390
  yield '"AI is transforming technology"'
391
  yield ', "ML algorithms are improving"'
392
  yield ', "Deep learning processes data efficiently"'
393
  yield '], "category": "'
394
+ yield "Technology"
395
  yield '", "sentiment": "'
396
+ yield "positive"
397
  yield '", "read_time_min": '
398
+ yield "3"
399
+ yield "}"
400
 
401
  with patch(
402
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream_json",
 
501
 
502
  def test_v4_stream_json_no_metadata(client: TestClient):
503
  """Test stream-json endpoint with include_metadata=false."""
504
+
505
  async def mock_json_stream(*args, **kwargs):
506
  yield '{"title": "Test", '
507
  yield '"main_summary": "Summary", '
 
517
  response = client.post(
518
  "/api/v4/scrape-and-summarize/stream-json",
519
  json={
520
+ "text": "Test article content for summary generation with enough characters to pass validation."
521
+ * 2,
522
  "style": "eli5",
523
  "include_metadata": False,
524
  },
 
537
  if events and events[0]:
538
  try:
539
  first_event = json.loads(events[0])
540
+ assert first_event.get("type") != "metadata", (
541
+ "Metadata should not be included"
542
+ )
543
  except json.JSONDecodeError:
544
  # First event is not complete JSON, so it's raw tokens (good!)
545
  pass
 
555
  styles_to_test = ["skimmer", "executive", "eli5"]
556
 
557
  for style in styles_to_test:
558
+ # Capture loop variable in closure
559
+ def make_mock_stream(style_name: str):
560
+ async def mock_json_stream(*args, **kwargs):
561
+ yield f'{{"title": "{style_name.upper()}", '
562
+ yield '"main_summary": "Test", '
563
+ yield '"key_points": ["A"], '
564
+ yield '"category": "Test", '
565
+ yield '"sentiment": "positive", '
566
+ yield '"read_time_min": 1}'
567
+
568
+ return mock_json_stream
569
 
570
  with patch(
571
  "app.services.structured_summarizer.structured_summarizer_service.summarize_structured_stream_json",
572
+ side_effect=make_mock_stream(style),
573
  ):
574
  response = client.post(
575
  "/api/v4/scrape-and-summarize/stream-json",
576
  json={
577
+ "text": "Test content for different styles with sufficient character count to pass validation requirements."
578
+ * 2,
579
  "style": style,
580
  "include_metadata": False,
581
  },
 
586
 
587
  def test_v4_stream_json_custom_max_tokens(client: TestClient):
588
  """Test stream-json endpoint with custom max_tokens parameter."""
589
+
590
  async def mock_json_stream(text, style, max_tokens=None):
591
  # Verify max_tokens is passed through
592
  assert max_tokens == 1536
 
604
  response = client.post(
605
  "/api/v4/scrape-and-summarize/stream-json",
606
  json={
607
+ "text": "Test content with custom max tokens that meets minimum character requirements."
608
+ * 3,
609
  "style": "executive",
610
  "max_tokens": 1536,
611
  "include_metadata": False,
 
727
 
728
  def test_v4_stream_json_response_headers(client: TestClient):
729
  """Test stream-json endpoint returns correct SSE headers."""
730
+
731
  async def mock_json_stream(*args, **kwargs):
732
  yield '{"title": "Test", "main_summary": "Test", "key_points": [], '
733
  yield '"category": "Test", "sentiment": "neutral", "read_time_min": 1}'
tests/test_v4_live.py CHANGED
@@ -9,6 +9,7 @@ Run with: pytest tests/test_v4_live.py -v
9
  """
10
 
11
  import json
 
12
  import pytest
13
  from pydantic import ValidationError
14
 
@@ -20,14 +21,16 @@ def test_outlines_library_imports():
20
  """Test that Outlines library can be imported successfully."""
21
  try:
22
  import outlines
23
- from outlines import models as outlines_models
24
  from outlines import generate as outlines_generate
 
25
 
26
  # Verify key components exist
27
  assert outlines is not None
28
  assert outlines_models is not None
29
  assert outlines_generate is not None
30
- assert hasattr(outlines_generate, 'json'), "outlines.generate should have 'json' method"
 
 
31
 
32
  print("βœ… Outlines library imported successfully")
33
  except ImportError as e:
@@ -53,7 +56,7 @@ async def test_structured_summarizer_initialization():
53
  assert structured_summarizer_service is not None
54
 
55
  # Check that Outlines model wrapper was created
56
- assert hasattr(structured_summarizer_service, 'outlines_model'), (
57
  "StructuredSummarizer should have 'outlines_model' attribute"
58
  )
59
 
@@ -62,7 +65,7 @@ async def test_structured_summarizer_initialization():
62
  "Check StructuredSummarizer.__init__() for errors."
63
  )
64
 
65
- print(f"βœ… StructuredSummarizer initialized with Outlines wrapper")
66
 
67
 
68
  @pytest.mark.asyncio
@@ -76,8 +79,8 @@ async def test_outlines_json_streaming_basic():
76
  - The JSON schema binding fails
77
  - The streaming doesn't produce valid JSON
78
  """
79
- from app.services.structured_summarizer import structured_summarizer_service
80
  from app.api.v4.schemas import StructuredSummary, SummarizationStyle
 
81
 
82
  # Use a simple test text
83
  test_text = (
@@ -89,14 +92,12 @@ async def test_outlines_json_streaming_basic():
89
  # Call the actual Outlines-based streaming method
90
  json_tokens = []
91
  async for token in structured_summarizer_service.summarize_structured_stream_json(
92
- text=test_text,
93
- style=SummarizationStyle.EXECUTIVE,
94
- max_tokens=256
95
  ):
96
  json_tokens.append(token)
97
 
98
  # Combine all tokens into complete JSON string
99
- complete_json = ''.join(json_tokens)
100
 
101
  print(f"\nπŸ“ Generated JSON ({len(complete_json)} chars):")
102
  print(complete_json)
@@ -105,7 +106,9 @@ async def test_outlines_json_streaming_basic():
105
  try:
106
  parsed_json = json.loads(complete_json)
107
  except json.JSONDecodeError as e:
108
- pytest.fail(f"Outlines generated invalid JSON: {e}\n\nGenerated content:\n{complete_json}")
 
 
109
 
110
  # Verify it matches the StructuredSummary schema
111
  try:
@@ -115,14 +118,16 @@ async def test_outlines_json_streaming_basic():
115
  assert structured_summary.title, "title should not be empty"
116
  assert structured_summary.main_summary, "main_summary should not be empty"
117
  assert structured_summary.key_points, "key_points should not be empty"
118
- assert len(structured_summary.key_points) > 0, "key_points should have at least one item"
 
 
119
  assert structured_summary.category, "category should not be empty"
120
- assert structured_summary.sentiment in ['positive', 'negative', 'neutral'], (
121
  f"sentiment should be valid enum value, got: {structured_summary.sentiment}"
122
  )
123
  assert structured_summary.read_time_min > 0, "read_time_min should be positive"
124
 
125
- print(f"βœ… Outlines generated valid StructuredSummary:")
126
  print(f" Title: {structured_summary.title}")
127
  print(f" Summary: {structured_summary.main_summary[:100]}...")
128
  print(f" Key Points: {len(structured_summary.key_points)} items")
@@ -131,47 +136,51 @@ async def test_outlines_json_streaming_basic():
131
  print(f" Read Time: {structured_summary.read_time_min} min")
132
 
133
  except ValidationError as e:
134
- pytest.fail(f"Outlines generated JSON doesn't match StructuredSummary schema: {e}\n\nGenerated JSON:\n{complete_json}")
 
 
135
 
136
 
137
  @pytest.mark.asyncio
138
  async def test_outlines_json_streaming_different_styles():
139
  """Test that Outlines works with different summarization styles."""
140
- from app.services.structured_summarizer import structured_summarizer_service
141
  from app.api.v4.schemas import StructuredSummary, SummarizationStyle
 
142
 
143
  test_text = "Climate change is affecting global weather patterns. Scientists warn of rising temperatures."
144
 
145
  styles_to_test = [
146
  SummarizationStyle.SKIMMER,
147
  SummarizationStyle.EXECUTIVE,
148
- SummarizationStyle.ELI5
149
  ]
150
 
151
  for style in styles_to_test:
152
  json_tokens = []
153
- async for token in structured_summarizer_service.summarize_structured_stream_json(
154
- text=test_text,
155
- style=style,
156
- max_tokens=128
157
  ):
158
  json_tokens.append(token)
159
 
160
- complete_json = ''.join(json_tokens)
161
 
162
  try:
163
  parsed_json = json.loads(complete_json)
164
- structured_summary = StructuredSummary(**parsed_json)
165
  print(f"βœ… Style {style.value}: Generated valid summary")
166
  except (json.JSONDecodeError, ValidationError) as e:
167
- pytest.fail(f"Failed to generate valid summary for style {style.value}: {e}")
 
 
168
 
169
 
170
  @pytest.mark.asyncio
171
  async def test_outlines_with_longer_text():
172
  """Test Outlines with longer text that triggers truncation."""
173
- from app.services.structured_summarizer import structured_summarizer_service
174
  from app.api.v4.schemas import StructuredSummary, SummarizationStyle
 
175
 
176
  # Create a longer text (will be truncated to 10000 chars)
177
  test_text = (
@@ -182,17 +191,15 @@ async def test_outlines_with_longer_text():
182
 
183
  json_tokens = []
184
  async for token in structured_summarizer_service.summarize_structured_stream_json(
185
- text=test_text,
186
- style=SummarizationStyle.EXECUTIVE,
187
- max_tokens=256
188
  ):
189
  json_tokens.append(token)
190
 
191
- complete_json = ''.join(json_tokens)
192
 
193
  try:
194
  parsed_json = json.loads(complete_json)
195
- structured_summary = StructuredSummary(**parsed_json)
196
  print(f"βœ… Long text: Generated valid summary from {len(test_text)} chars")
197
  except (json.JSONDecodeError, ValidationError) as e:
198
  pytest.fail(f"Failed to generate valid summary for long text: {e}")
@@ -201,8 +208,8 @@ async def test_outlines_with_longer_text():
201
  @pytest.mark.asyncio
202
  async def test_outlines_error_handling_when_model_unavailable():
203
  """Test that proper error JSON is returned if Outlines model is unavailable."""
204
- from app.services.structured_summarizer import StructuredSummarizer
205
  from app.api.v4.schemas import SummarizationStyle
 
206
 
207
  # Create a StructuredSummarizer instance without initializing the model
208
  # This simulates the case where Outlines is unavailable
@@ -213,18 +220,16 @@ async def test_outlines_error_handling_when_model_unavailable():
213
 
214
  json_tokens = []
215
  async for token in fake_summarizer.summarize_structured_stream_json(
216
- text="Test text",
217
- style=SummarizationStyle.EXECUTIVE,
218
- max_tokens=128
219
  ):
220
  json_tokens.append(token)
221
 
222
- complete_json = ''.join(json_tokens)
223
 
224
  # Should return error JSON
225
  try:
226
  parsed_json = json.loads(complete_json)
227
- assert 'error' in parsed_json, "Error response should contain 'error' field"
228
  print(f"βœ… Error handling: {parsed_json['error']}")
229
  except json.JSONDecodeError as e:
230
  pytest.fail(f"Error response is not valid JSON: {e}")
 
9
  """
10
 
11
  import json
12
+
13
  import pytest
14
  from pydantic import ValidationError
15
 
 
21
  """Test that Outlines library can be imported successfully."""
22
  try:
23
  import outlines
 
24
  from outlines import generate as outlines_generate
25
+ from outlines import models as outlines_models
26
 
27
  # Verify key components exist
28
  assert outlines is not None
29
  assert outlines_models is not None
30
  assert outlines_generate is not None
31
+ assert hasattr(outlines_generate, "json"), (
32
+ "outlines.generate should have 'json' method"
33
+ )
34
 
35
  print("βœ… Outlines library imported successfully")
36
  except ImportError as e:
 
56
  assert structured_summarizer_service is not None
57
 
58
  # Check that Outlines model wrapper was created
59
+ assert hasattr(structured_summarizer_service, "outlines_model"), (
60
  "StructuredSummarizer should have 'outlines_model' attribute"
61
  )
62
 
 
65
  "Check StructuredSummarizer.__init__() for errors."
66
  )
67
 
68
+ print("βœ… StructuredSummarizer initialized with Outlines wrapper")
69
 
70
 
71
  @pytest.mark.asyncio
 
79
  - The JSON schema binding fails
80
  - The streaming doesn't produce valid JSON
81
  """
 
82
  from app.api.v4.schemas import StructuredSummary, SummarizationStyle
83
+ from app.services.structured_summarizer import structured_summarizer_service
84
 
85
  # Use a simple test text
86
  test_text = (
 
92
  # Call the actual Outlines-based streaming method
93
  json_tokens = []
94
  async for token in structured_summarizer_service.summarize_structured_stream_json(
95
+ text=test_text, style=SummarizationStyle.EXECUTIVE, max_tokens=256
 
 
96
  ):
97
  json_tokens.append(token)
98
 
99
  # Combine all tokens into complete JSON string
100
+ complete_json = "".join(json_tokens)
101
 
102
  print(f"\nπŸ“ Generated JSON ({len(complete_json)} chars):")
103
  print(complete_json)
 
106
  try:
107
  parsed_json = json.loads(complete_json)
108
  except json.JSONDecodeError as e:
109
+ pytest.fail(
110
+ f"Outlines generated invalid JSON: {e}\n\nGenerated content:\n{complete_json}"
111
+ )
112
 
113
  # Verify it matches the StructuredSummary schema
114
  try:
 
118
  assert structured_summary.title, "title should not be empty"
119
  assert structured_summary.main_summary, "main_summary should not be empty"
120
  assert structured_summary.key_points, "key_points should not be empty"
121
+ assert len(structured_summary.key_points) > 0, (
122
+ "key_points should have at least one item"
123
+ )
124
  assert structured_summary.category, "category should not be empty"
125
+ assert structured_summary.sentiment in ["positive", "negative", "neutral"], (
126
  f"sentiment should be valid enum value, got: {structured_summary.sentiment}"
127
  )
128
  assert structured_summary.read_time_min > 0, "read_time_min should be positive"
129
 
130
+ print("βœ… Outlines generated valid StructuredSummary:")
131
  print(f" Title: {structured_summary.title}")
132
  print(f" Summary: {structured_summary.main_summary[:100]}...")
133
  print(f" Key Points: {len(structured_summary.key_points)} items")
 
136
  print(f" Read Time: {structured_summary.read_time_min} min")
137
 
138
  except ValidationError as e:
139
+ pytest.fail(
140
+ f"Outlines generated JSON doesn't match StructuredSummary schema: {e}\n\nGenerated JSON:\n{complete_json}"
141
+ )
142
 
143
 
144
  @pytest.mark.asyncio
145
  async def test_outlines_json_streaming_different_styles():
146
  """Test that Outlines works with different summarization styles."""
 
147
  from app.api.v4.schemas import StructuredSummary, SummarizationStyle
148
+ from app.services.structured_summarizer import structured_summarizer_service
149
 
150
  test_text = "Climate change is affecting global weather patterns. Scientists warn of rising temperatures."
151
 
152
  styles_to_test = [
153
  SummarizationStyle.SKIMMER,
154
  SummarizationStyle.EXECUTIVE,
155
+ SummarizationStyle.ELI5,
156
  ]
157
 
158
  for style in styles_to_test:
159
  json_tokens = []
160
+ async for (
161
+ token
162
+ ) in structured_summarizer_service.summarize_structured_stream_json(
163
+ text=test_text, style=style, max_tokens=128
164
  ):
165
  json_tokens.append(token)
166
 
167
+ complete_json = "".join(json_tokens)
168
 
169
  try:
170
  parsed_json = json.loads(complete_json)
171
+ StructuredSummary(**parsed_json)
172
  print(f"βœ… Style {style.value}: Generated valid summary")
173
  except (json.JSONDecodeError, ValidationError) as e:
174
+ pytest.fail(
175
+ f"Failed to generate valid summary for style {style.value}: {e}"
176
+ )
177
 
178
 
179
  @pytest.mark.asyncio
180
  async def test_outlines_with_longer_text():
181
  """Test Outlines with longer text that triggers truncation."""
 
182
  from app.api.v4.schemas import StructuredSummary, SummarizationStyle
183
+ from app.services.structured_summarizer import structured_summarizer_service
184
 
185
  # Create a longer text (will be truncated to 10000 chars)
186
  test_text = (
 
191
 
192
  json_tokens = []
193
  async for token in structured_summarizer_service.summarize_structured_stream_json(
194
+ text=test_text, style=SummarizationStyle.EXECUTIVE, max_tokens=256
 
 
195
  ):
196
  json_tokens.append(token)
197
 
198
+ complete_json = "".join(json_tokens)
199
 
200
  try:
201
  parsed_json = json.loads(complete_json)
202
+ StructuredSummary(**parsed_json)
203
  print(f"βœ… Long text: Generated valid summary from {len(test_text)} chars")
204
  except (json.JSONDecodeError, ValidationError) as e:
205
  pytest.fail(f"Failed to generate valid summary for long text: {e}")
 
208
  @pytest.mark.asyncio
209
  async def test_outlines_error_handling_when_model_unavailable():
210
  """Test that proper error JSON is returned if Outlines model is unavailable."""
 
211
  from app.api.v4.schemas import SummarizationStyle
212
+ from app.services.structured_summarizer import StructuredSummarizer
213
 
214
  # Create a StructuredSummarizer instance without initializing the model
215
  # This simulates the case where Outlines is unavailable
 
220
 
221
  json_tokens = []
222
  async for token in fake_summarizer.summarize_structured_stream_json(
223
+ text="Test text", style=SummarizationStyle.EXECUTIVE, max_tokens=128
 
 
224
  ):
225
  json_tokens.append(token)
226
 
227
+ complete_json = "".join(json_tokens)
228
 
229
  # Should return error JSON
230
  try:
231
  parsed_json = json.loads(complete_json)
232
+ assert "error" in parsed_json, "Error response should contain 'error' field"
233
  print(f"βœ… Error handling: {parsed_json['error']}")
234
  except json.JSONDecodeError as e:
235
  pytest.fail(f"Error response is not valid JSON: {e}")