Spaces:

minhvtt
/

EBD_Fest

Sleeping

App Files Files Community

EBD_Fest / ADVANCED_RAG_GUIDE.md

minhvtt

Upload 20 files

cb93402 verified 12 days ago

preview code

raw

history blame contribute delete

7.2 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Advanced RAG Chatbot - User Guide

What's New?

1. Multiple Images & Texts Support in `/index` API

The /index endpoint now supports indexing multiple texts and images in a single request (max 10 each).

Before:

# Old: Only 1 text and 1 image
data = {
    'id': 'doc1',
    'text': 'Single text',
}
files = {'image': open('image.jpg', 'rb')}

After:

# New: Multiple texts and images (max 10 each)
data = {
    'id': 'doc1',
    'texts': ['Text 1', 'Text 2', 'Text 3'],  # Up to 10
}
files = [
    ('images', open('image1.jpg', 'rb')),
    ('images', open('image2.jpg', 'rb')),
    ('images', open('image3.jpg', 'rb')),  # Up to 10
]
response = requests.post('http://localhost:8000/index', data=data, files=files)

Example with cURL:

curl -X POST "http://localhost:8000/index" \
  -F "id=event123" \
  -F "texts=Sự kiện âm nhạc tại Hà Nội" \
  -F "texts=Diễn ra vào ngày 20/10/2025" \
  -F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "[email protected]"

2. Advanced RAG Pipeline in `/chat` API

The chat endpoint now uses modern RAG techniques for better response quality:

Key Improvements:

Query Expansion: Automatically expands your question with variations
Multi-Query Retrieval: Searches with multiple query variants
Reranking: Re-scores results for better relevance
Contextual Compression: Keeps only the most relevant parts
Better Prompt Engineering: Optimized prompts for LLM

How to Use:

Basic Usage (Auto-enabled):

import requests

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Dao có nguy hiểm không?',
    'use_rag': True,
    'use_advanced_rag': True,  # Default: True
    'hf_token': 'hf_xxxxx'
})

result = response.json()
print("Response:", result['response'])
print("RAG Stats:", result['rag_stats'])  # See pipeline statistics

Advanced Configuration:

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Làm sao để tạo event mới?',
    'use_rag': True,
    'use_advanced_rag': True,

    # RAG Pipeline Options
    'use_query_expansion': True,    # Expand query with variations
    'use_reranking': True,          # Rerank results
    'use_compression': True,        # Compress context
    'score_threshold': 0.5,         # Min relevance score (0-1)
    'top_k': 5,                     # Number of documents to retrieve

    # LLM Options
    'max_tokens': 512,
    'temperature': 0.7,
    'hf_token': 'hf_xxxxx'
})

Disable Advanced RAG (Use Basic):

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Your question',
    'use_rag': True,
    'use_advanced_rag': False,  # Use basic RAG
})

API Changes Summary

`/index` Endpoint

Old Parameters:

id: str (required)
text: str (required)
image: UploadFile (optional)

New Parameters:

id: str (required)
texts: List[str] (optional, max 10)
images: List[UploadFile] (optional, max 10)

Response:

{
  "success": true,
  "id": "doc123",
  "message": "Đã index thành công document doc123 với 3 texts và 2 images"
}

`/chat` Endpoint

New Parameters:

use_advanced_rag: bool (default: True) - Enable advanced RAG
use_query_expansion: bool (default: True) - Expand query
use_reranking: bool (default: True) - Rerank results
use_compression: bool (default: True) - Compress context
score_threshold: float (default: 0.5) - Min relevance score

Response (New):

{
  "response": "AI generated answer...",
  "context_used": [...],
  "timestamp": "2025-10-29T...",
  "rag_stats": {
    "original_query": "Your question",
    "expanded_queries": ["Query variant 1", "Query variant 2"],
    "initial_results": 10,
    "after_rerank": 5,
    "after_compression": 5
  }
}

Complete Examples

Example 1: Index Multiple Social Media Posts

import requests

# Index a social media event with multiple posts and images
data = {
    'id': 'event_festival_2025',
    'texts': [
        'Festival âm nhạc quốc tế Hà Nội 2025',
        'Ngày 15-17 tháng 11 năm 2025',
        'Địa điểm: Công viên Thống Nhất',
        'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh',
        'Giá vé từ 500.000đ - 2.000.000đ'
    ]
}

files = [
    ('images', open('poster_festival.jpg', 'rb')),
    ('images', open('lineup.jpg', 'rb')),
    ('images', open('venue_map.jpg', 'rb'))
]

response = requests.post('http://localhost:8000/index', data=data, files=files)
print(response.json())

Example 2: Advanced RAG Chat

import requests

# Chat with advanced RAG
chat_response = requests.post('http://localhost:8000/chat', json={
    'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?',
    'use_rag': True,
    'use_advanced_rag': True,
    'top_k': 3,
    'score_threshold': 0.6,
    'hf_token': 'your_hf_token_here'
})

result = chat_response.json()
print("Answer:", result['response'])
print("\nRetrieved Context:")
for ctx in result['context_used']:
    print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}")

print("\nRAG Pipeline Stats:")
print(f"- Original query: {result['rag_stats']['original_query']}")
print(f"- Query variants: {result['rag_stats']['expanded_queries']}")
print(f"- Documents retrieved: {result['rag_stats']['initial_results']}")
print(f"- After reranking: {result['rag_stats']['after_rerank']}")

Performance Comparison

Feature	Basic RAG	Advanced RAG
Query Understanding	Single query	Multiple query variants
Retrieval Method	Direct vector search	Multi-query + hybrid
Result Ranking	Score from DB	Reranked with semantic similarity
Context Quality	Full text	Compressed, relevant parts only
Response Accuracy	Good	Better
Response Time	Faster	Slightly slower but better quality

When to Use What?

Use Basic RAG when:

You need fast response time
Queries are straightforward
Context is already well-structured

Use Advanced RAG when:

You need higher accuracy
Queries are complex or ambiguous
Context documents are long
You want better relevance

Troubleshooting

Error: "Tối đa 10 texts"

You're sending more than 10 texts. Reduce to max 10.

Error: "Tối đa 10 images"

You're sending more than 10 images. Reduce to max 10.

RAG stats show 0 results

Your score_threshold might be too high. Try lowering it (e.g., 0.3-0.5).

Next Steps

To further improve RAG, consider:

Add BM25 Hybrid Search: Combine dense + sparse retrieval
Use Cross-Encoder for Reranking: Better than embedding similarity
Implement Query Decomposition: Break complex queries into sub-queries
Add Citation/Source Tracking: Show which document each fact comes from
Integrate RAG-Anything: For advanced multimodal document processing

For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything