EBD_Fest / ADVANCED_RAG_GUIDE.md
minhvtt's picture
Upload 20 files
cb93402 verified

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Advanced RAG Chatbot - User Guide

What's New?

1. Multiple Images & Texts Support in /index API

The /index endpoint now supports indexing multiple texts and images in a single request (max 10 each).

Before:

# Old: Only 1 text and 1 image
data = {
    'id': 'doc1',
    'text': 'Single text',
}
files = {'image': open('image.jpg', 'rb')}

After:

# New: Multiple texts and images (max 10 each)
data = {
    'id': 'doc1',
    'texts': ['Text 1', 'Text 2', 'Text 3'],  # Up to 10
}
files = [
    ('images', open('image1.jpg', 'rb')),
    ('images', open('image2.jpg', 'rb')),
    ('images', open('image3.jpg', 'rb')),  # Up to 10
]
response = requests.post('http://localhost:8000/index', data=data, files=files)

Example with cURL:

curl -X POST "http://localhost:8000/index" \
  -F "id=event123" \
  -F "texts=Sự kiện âm nhạc tại Hà Nội" \
  -F "texts=Diễn ra vào ngày 20/10/2025" \
  -F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "[email protected]"

2. Advanced RAG Pipeline in /chat API

The chat endpoint now uses modern RAG techniques for better response quality:

Key Improvements:

  1. Query Expansion: Automatically expands your question with variations
  2. Multi-Query Retrieval: Searches with multiple query variants
  3. Reranking: Re-scores results for better relevance
  4. Contextual Compression: Keeps only the most relevant parts
  5. Better Prompt Engineering: Optimized prompts for LLM

How to Use:

Basic Usage (Auto-enabled):

import requests

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Dao có nguy hiểm không?',
    'use_rag': True,
    'use_advanced_rag': True,  # Default: True
    'hf_token': 'hf_xxxxx'
})

result = response.json()
print("Response:", result['response'])
print("RAG Stats:", result['rag_stats'])  # See pipeline statistics

Advanced Configuration:

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Làm sao để tạo event mới?',
    'use_rag': True,
    'use_advanced_rag': True,

    # RAG Pipeline Options
    'use_query_expansion': True,    # Expand query with variations
    'use_reranking': True,          # Rerank results
    'use_compression': True,        # Compress context
    'score_threshold': 0.5,         # Min relevance score (0-1)
    'top_k': 5,                     # Number of documents to retrieve

    # LLM Options
    'max_tokens': 512,
    'temperature': 0.7,
    'hf_token': 'hf_xxxxx'
})

Disable Advanced RAG (Use Basic):

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Your question',
    'use_rag': True,
    'use_advanced_rag': False,  # Use basic RAG
})

API Changes Summary

/index Endpoint

Old Parameters:

  • id: str (required)
  • text: str (required)
  • image: UploadFile (optional)

New Parameters:

  • id: str (required)
  • texts: List[str] (optional, max 10)
  • images: List[UploadFile] (optional, max 10)

Response:

{
  "success": true,
  "id": "doc123",
  "message": "Đã index thành công document doc123 với 3 texts và 2 images"
}

/chat Endpoint

New Parameters:

  • use_advanced_rag: bool (default: True) - Enable advanced RAG
  • use_query_expansion: bool (default: True) - Expand query
  • use_reranking: bool (default: True) - Rerank results
  • use_compression: bool (default: True) - Compress context
  • score_threshold: float (default: 0.5) - Min relevance score

Response (New):

{
  "response": "AI generated answer...",
  "context_used": [...],
  "timestamp": "2025-10-29T...",
  "rag_stats": {
    "original_query": "Your question",
    "expanded_queries": ["Query variant 1", "Query variant 2"],
    "initial_results": 10,
    "after_rerank": 5,
    "after_compression": 5
  }
}

Complete Examples

Example 1: Index Multiple Social Media Posts

import requests

# Index a social media event with multiple posts and images
data = {
    'id': 'event_festival_2025',
    'texts': [
        'Festival âm nhạc quốc tế Hà Nội 2025',
        'Ngày 15-17 tháng 11 năm 2025',
        'Địa điểm: Công viên Thống Nhất',
        'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh',
        'Giá vé từ 500.000đ - 2.000.000đ'
    ]
}

files = [
    ('images', open('poster_festival.jpg', 'rb')),
    ('images', open('lineup.jpg', 'rb')),
    ('images', open('venue_map.jpg', 'rb'))
]

response = requests.post('http://localhost:8000/index', data=data, files=files)
print(response.json())

Example 2: Advanced RAG Chat

import requests

# Chat with advanced RAG
chat_response = requests.post('http://localhost:8000/chat', json={
    'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?',
    'use_rag': True,
    'use_advanced_rag': True,
    'top_k': 3,
    'score_threshold': 0.6,
    'hf_token': 'your_hf_token_here'
})

result = chat_response.json()
print("Answer:", result['response'])
print("\nRetrieved Context:")
for ctx in result['context_used']:
    print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}")

print("\nRAG Pipeline Stats:")
print(f"- Original query: {result['rag_stats']['original_query']}")
print(f"- Query variants: {result['rag_stats']['expanded_queries']}")
print(f"- Documents retrieved: {result['rag_stats']['initial_results']}")
print(f"- After reranking: {result['rag_stats']['after_rerank']}")

Performance Comparison

Feature Basic RAG Advanced RAG
Query Understanding Single query Multiple query variants
Retrieval Method Direct vector search Multi-query + hybrid
Result Ranking Score from DB Reranked with semantic similarity
Context Quality Full text Compressed, relevant parts only
Response Accuracy Good Better
Response Time Faster Slightly slower but better quality

When to Use What?

Use Basic RAG when:

  • You need fast response time
  • Queries are straightforward
  • Context is already well-structured

Use Advanced RAG when:

  • You need higher accuracy
  • Queries are complex or ambiguous
  • Context documents are long
  • You want better relevance

Troubleshooting

Error: "Tối đa 10 texts"

You're sending more than 10 texts. Reduce to max 10.

Error: "Tối đa 10 images"

You're sending more than 10 images. Reduce to max 10.

RAG stats show 0 results

Your score_threshold might be too high. Try lowering it (e.g., 0.3-0.5).

Next Steps

To further improve RAG, consider:

  1. Add BM25 Hybrid Search: Combine dense + sparse retrieval
  2. Use Cross-Encoder for Reranking: Better than embedding similarity
  3. Implement Query Decomposition: Break complex queries into sub-queries
  4. Add Citation/Source Tracking: Show which document each fact comes from
  5. Integrate RAG-Anything: For advanced multimodal document processing

For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything