ming Claude commited on
Commit
6b2de93
Β·
1 Parent(s): 5e83010

fix: Improve V3 summary completeness with enhanced token allocation

Browse files

This commit addresses remaining early stopping issues by fixing three cascading problems
that caused summaries to truncate prematurely, especially for long articles.

Critical Changes:
1. Increased recursive chunk token limit: 80 β†’ 200 tokens per chunk
- Location: app/services/hf_streaming_summarizer.py:480
- Formula: min(max_new_tokens // 2, 200)
- Impact: 2.5x more tokens for long article chunks (>1500 chars)

2. Raised min_new_tokens floor: 50 β†’ 200 tokens
- Location: app/services/hf_streaming_summarizer.py:389-391, 651-652
- Formula: max(50, min(max_new_tokens // 2, 200))
- Impact: Prevents early stopping at 50 tokens when 512 allocated

3. More aggressive adaptive formula: text_length // 4 β†’ // 3
- Location: app/api/v3/scrape_summarize.py:121
- Impact: 6000-char articles get ~2000 tokens (before 1024 cap)

Expected Results:
- Short articles (<1500 chars): Better min quality guarantee (50 β†’ 200 tokens)
- Medium articles (2000 chars): More complete summaries (500 β†’ 667 adaptive tokens)
- Long articles (>3000 chars): Significantly improved chunk quality (80 β†’ 200 tokens)

Test Results:
- All V3 tests passing (16/16) βœ…
- All HF generation parameter tests passing (3/3) βœ…
- No performance degradation

Fixes: Summaries stopping early despite increased token limits
Related: Previous commit 5e83010 (adaptive token calculation)

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

app/api/v3/scrape_summarize.py CHANGED
@@ -118,7 +118,7 @@ async def _stream_generator(text: str, payload, metadata: dict, request_id: str)
118
  # Formula: scale tokens with input length, but enforce min/max bounds
119
  text_length = len(text)
120
  adaptive_max_tokens = min(
121
- max(text_length // 4, 300), # At least 300 tokens, scale with length
122
  payload.max_tokens, # Respect user's max if specified
123
  1024, # Cap at 1024 to avoid excessive generation
124
  )
 
118
  # Formula: scale tokens with input length, but enforce min/max bounds
119
  text_length = len(text)
120
  adaptive_max_tokens = min(
121
+ max(text_length // 3, 300), # At least 300 tokens, scale ~33% of input chars
122
  payload.max_tokens, # Respect user's max if specified
123
  1024, # Cap at 1024 to avoid excessive generation
124
  )
app/services/hf_streaming_summarizer.py CHANGED
@@ -385,9 +385,10 @@ class HFStreamingSummarizer:
385
  if min_length is not None:
386
  gen_kwargs["min_new_tokens"] = min_length
387
  else:
 
388
  gen_kwargs["min_new_tokens"] = max(
389
- 20, min(50, max_new_tokens // 4)
390
- ) # floor ~20-50
391
  # Use slightly positive length_penalty to favor complete sentences
392
  gen_kwargs["length_penalty"] = 1.2
393
  # Reduce premature EOS in some checkpoints (optional)
@@ -475,8 +476,9 @@ class HFStreamingSummarizer:
475
  for i, chunk in enumerate(chunks):
476
  logger.info(f"Summarizing chunk {i+1}/{len(chunks)}")
477
 
478
- # Use smaller max_new_tokens for individual chunks
479
- chunk_max_tokens = min(max_new_tokens, 80)
 
480
 
481
  chunk_summary = ""
482
  async for chunk_result in self._single_chunk_summarize(
@@ -646,7 +648,8 @@ class HFStreamingSummarizer:
646
  if min_length is not None:
647
  calculated_min_tokens = min_length
648
  else:
649
- calculated_min_tokens = max(20, min(50, max_new_tokens // 4))
 
650
 
651
  gen_kwargs = {
652
  **inputs,
 
385
  if min_length is not None:
386
  gen_kwargs["min_new_tokens"] = min_length
387
  else:
388
+ # Ensure minimum quality: at least 50 tokens, up to half of max (capped at 200)
389
  gen_kwargs["min_new_tokens"] = max(
390
+ 50, min(max_new_tokens // 2, 200)
391
+ )
392
  # Use slightly positive length_penalty to favor complete sentences
393
  gen_kwargs["length_penalty"] = 1.2
394
  # Reduce premature EOS in some checkpoints (optional)
 
476
  for i, chunk in enumerate(chunks):
477
  logger.info(f"Summarizing chunk {i+1}/{len(chunks)}")
478
 
479
+ # Use reasonable max_new_tokens for individual chunks
480
+ # Allow at least half of max, up to 200 tokens per chunk
481
+ chunk_max_tokens = min(max_new_tokens // 2, 200)
482
 
483
  chunk_summary = ""
484
  async for chunk_result in self._single_chunk_summarize(
 
648
  if min_length is not None:
649
  calculated_min_tokens = min_length
650
  else:
651
+ # Ensure minimum quality: at least 50 tokens, up to half of max (capped at 200)
652
+ calculated_min_tokens = max(50, min(max_new_tokens // 2, 200))
653
 
654
  gen_kwargs = {
655
  **inputs,