A newer version of the Gradio SDK is available:
5.49.1
Advanced RAG Chatbot - User Guide
What's New?
1. Multiple Images & Texts Support in /index API
The /index endpoint now supports indexing multiple texts and images in a single request (max 10 each).
Before:
# Old: Only 1 text and 1 image
data = {
'id': 'doc1',
'text': 'Single text',
}
files = {'image': open('image.jpg', 'rb')}
After:
# New: Multiple texts and images (max 10 each)
data = {
'id': 'doc1',
'texts': ['Text 1', 'Text 2', 'Text 3'], # Up to 10
}
files = [
('images', open('image1.jpg', 'rb')),
('images', open('image2.jpg', 'rb')),
('images', open('image3.jpg', 'rb')), # Up to 10
]
response = requests.post('http://localhost:8000/index', data=data, files=files)
Example with cURL:
curl -X POST "http://localhost:8000/index" \
-F "id=event123" \
-F "texts=Sự kiện âm nhạc tại Hà Nội" \
-F "texts=Diễn ra vào ngày 20/10/2025" \
-F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \
-F "[email protected]" \
-F "[email protected]" \
-F "[email protected]"
2. Advanced RAG Pipeline in /chat API
The chat endpoint now uses modern RAG techniques for better response quality:
Key Improvements:
- Query Expansion: Automatically expands your question with variations
- Multi-Query Retrieval: Searches with multiple query variants
- Reranking: Re-scores results for better relevance
- Contextual Compression: Keeps only the most relevant parts
- Better Prompt Engineering: Optimized prompts for LLM
How to Use:
Basic Usage (Auto-enabled):
import requests
response = requests.post('http://localhost:8000/chat', json={
'message': 'Dao có nguy hiểm không?',
'use_rag': True,
'use_advanced_rag': True, # Default: True
'hf_token': 'hf_xxxxx'
})
result = response.json()
print("Response:", result['response'])
print("RAG Stats:", result['rag_stats']) # See pipeline statistics
Advanced Configuration:
response = requests.post('http://localhost:8000/chat', json={
'message': 'Làm sao để tạo event mới?',
'use_rag': True,
'use_advanced_rag': True,
# RAG Pipeline Options
'use_query_expansion': True, # Expand query with variations
'use_reranking': True, # Rerank results
'use_compression': True, # Compress context
'score_threshold': 0.5, # Min relevance score (0-1)
'top_k': 5, # Number of documents to retrieve
# LLM Options
'max_tokens': 512,
'temperature': 0.7,
'hf_token': 'hf_xxxxx'
})
Disable Advanced RAG (Use Basic):
response = requests.post('http://localhost:8000/chat', json={
'message': 'Your question',
'use_rag': True,
'use_advanced_rag': False, # Use basic RAG
})
API Changes Summary
/index Endpoint
Old Parameters:
id: str (required)text: str (required)image: UploadFile (optional)
New Parameters:
id: str (required)texts: List[str] (optional, max 10)images: List[UploadFile] (optional, max 10)
Response:
{
"success": true,
"id": "doc123",
"message": "Đã index thành công document doc123 với 3 texts và 2 images"
}
/chat Endpoint
New Parameters:
use_advanced_rag: bool (default: True) - Enable advanced RAGuse_query_expansion: bool (default: True) - Expand queryuse_reranking: bool (default: True) - Rerank resultsuse_compression: bool (default: True) - Compress contextscore_threshold: float (default: 0.5) - Min relevance score
Response (New):
{
"response": "AI generated answer...",
"context_used": [...],
"timestamp": "2025-10-29T...",
"rag_stats": {
"original_query": "Your question",
"expanded_queries": ["Query variant 1", "Query variant 2"],
"initial_results": 10,
"after_rerank": 5,
"after_compression": 5
}
}
Complete Examples
Example 1: Index Multiple Social Media Posts
import requests
# Index a social media event with multiple posts and images
data = {
'id': 'event_festival_2025',
'texts': [
'Festival âm nhạc quốc tế Hà Nội 2025',
'Ngày 15-17 tháng 11 năm 2025',
'Địa điểm: Công viên Thống Nhất',
'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh',
'Giá vé từ 500.000đ - 2.000.000đ'
]
}
files = [
('images', open('poster_festival.jpg', 'rb')),
('images', open('lineup.jpg', 'rb')),
('images', open('venue_map.jpg', 'rb'))
]
response = requests.post('http://localhost:8000/index', data=data, files=files)
print(response.json())
Example 2: Advanced RAG Chat
import requests
# Chat with advanced RAG
chat_response = requests.post('http://localhost:8000/chat', json={
'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?',
'use_rag': True,
'use_advanced_rag': True,
'top_k': 3,
'score_threshold': 0.6,
'hf_token': 'your_hf_token_here'
})
result = chat_response.json()
print("Answer:", result['response'])
print("\nRetrieved Context:")
for ctx in result['context_used']:
print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}")
print("\nRAG Pipeline Stats:")
print(f"- Original query: {result['rag_stats']['original_query']}")
print(f"- Query variants: {result['rag_stats']['expanded_queries']}")
print(f"- Documents retrieved: {result['rag_stats']['initial_results']}")
print(f"- After reranking: {result['rag_stats']['after_rerank']}")
Performance Comparison
| Feature | Basic RAG | Advanced RAG |
|---|---|---|
| Query Understanding | Single query | Multiple query variants |
| Retrieval Method | Direct vector search | Multi-query + hybrid |
| Result Ranking | Score from DB | Reranked with semantic similarity |
| Context Quality | Full text | Compressed, relevant parts only |
| Response Accuracy | Good | Better |
| Response Time | Faster | Slightly slower but better quality |
When to Use What?
Use Basic RAG when:
- You need fast response time
- Queries are straightforward
- Context is already well-structured
Use Advanced RAG when:
- You need higher accuracy
- Queries are complex or ambiguous
- Context documents are long
- You want better relevance
Troubleshooting
Error: "Tối đa 10 texts"
You're sending more than 10 texts. Reduce to max 10.
Error: "Tối đa 10 images"
You're sending more than 10 images. Reduce to max 10.
RAG stats show 0 results
Your score_threshold might be too high. Try lowering it (e.g., 0.3-0.5).
Next Steps
To further improve RAG, consider:
- Add BM25 Hybrid Search: Combine dense + sparse retrieval
- Use Cross-Encoder for Reranking: Better than embedding similarity
- Implement Query Decomposition: Break complex queries into sub-queries
- Add Citation/Source Tracking: Show which document each fact comes from
- Integrate RAG-Anything: For advanced multimodal document processing
For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything