Spaces:

minhvtt
/

ChatbotRAG

Sleeping

App Files Files Community

ChatbotRAG / MULTIMODAL_PDF_GUIDE.md

minhvtt

Upload 20 files

500cf95 verified 10 days ago

preview code

raw

history blame contribute delete

12.5 kB

	# Multimodal PDF Guide - PDFs với Text + Hình Ảnh

	## Tổng Quan

	Hệ thống giờ hỗ trợ Multimodal PDF - PDFs có:
	- ✅ Text hướng dẫn
	- ✅ Image URLs (links đến hình ảnh)
	- ✅ Markdown images: `![alt](url)`
	- ✅ HTML images: `<img src="url">`

	Perfect cho: User guides với screenshots, tutorials với diagrams, documentation với visual aids.

	---

	## Tại Sao Cần Multimodal?

	### Vấn Đề Với PDF Thông Thường

	PDF hướng dẫn thường có:
	```
	Bước 1: Mở trang chủ
	[Xem hình ảnh: https://example.com/homepage.png]

	Bước 2: Click vào "Tạo mới"
	![Create button](https://example.com/create-button.png)

	Bước 3: Điền thông tin
	<img src="https://example.com/form.png" alt="Form" />
	```

	PDF parser cũ chỉ extract text → MẤT hết image URLs → Chatbot không biết hình ảnh nào liên quan!

	Multimodal PDF parser mới:
	- ✓ Extract text
	- ✓ Detect tất cả image URLs
	- ✓ Link images với text chunks tương ứng
	- ✓ Store URLs trong metadata
	- ✓ Return images cùng text khi chat

	---

	## So Sánh: PDF Thường vs Multimodal PDF

	\| Feature \| PDF Thường (`/upload-pdf`) \| Multimodal PDF (`/upload-pdf-multimodal`) \|
	\|---------\|---------------------------\|-------------------------------------------\|
	\| Extract text \| ✓ \| ✓ \|
	\| Detect image URLs \| ✗ \| ✓ \|
	\| Link images to chunks \| ✗ \| ✓ \|
	\| Return images in chat \| ✗ \| ✓ \|
	\| URL formats supported \| ✗ \| http://, https://, markdown, HTML \|
	\| Use case \| Simple text documents \| User guides, tutorials, docs with images \|

	---

	## Cách Sử Dụng

	### 1. Upload Multimodal PDF

	Endpoint: `POST /upload-pdf-multimodal`

	Curl:
	```bash
	curl -X POST "http://localhost:8000/upload-pdf-multimodal" \
	-F "file=@user_guide_with_images.pdf" \
	-F "title=Hướng dẫn sử dụng hệ thống" \
	-F "description=User guide with screenshots" \
	-F "category=user_guide"
	```

	Python:
	```python
	import requests

	with open('user_guide_with_images.pdf', 'rb') as f:
	response = requests.post(
	'http://localhost:8000/upload-pdf-multimodal',
	files={'file': f},
	data={
	'title': 'User Guide with Screenshots',
	'category': 'user_guide'
	}
	)

	result = response.json()
	print(f"Indexed: {result['chunks_indexed']} chunks")
	print(f"Images found: {result['message']}")
	```

	Response:
	```json
	{
	"success": true,
	"document_id": "pdf_multimodal_20251029_150000",
	"filename": "user_guide_with_images.pdf",
	"chunks_indexed": 25,
	"message": "PDF 'user_guide_with_images.pdf' indexed successfully with 25 chunks and 15 images"
	}
	```

	### 2. Chat Với Multimodal Context

	```python
	import requests

	response = requests.post('http://localhost:8000/chat', json={
	'message': 'Làm sao để tạo event mới?',
	'use_rag': True,
	'use_advanced_rag': True,
	'top_k': 3,
	'hf_token': 'your_token'
	})

	result = response.json()

	# Response text
	print("Answer:", result['response'])

	# Retrieved context with images
	for ctx in result['context_used']:
	print(f"\n--- Source: Page {ctx['metadata']['page']} ---")
	print(f"Text: {ctx['metadata']['text'][:200]}...")

	# Check if this chunk has images
	if ctx['metadata'].get('has_images'):
	print(f"Images ({ctx['metadata']['num_images']}):")
	for img_url in ctx['metadata'].get('image_urls', []):
	print(f" - {img_url}")
	```

	Example Output:
	```
	Answer: Để tạo event mới, bạn thực hiện các bước sau:
	1. Mở trang chủ và click vào nút "Tạo Event" (xem hình minh họa)
	2. Điền thông tin event...

	--- Source: Page 5 ---
	Text: Bước 1: Mở trang chủ và click vào nút "Tạo Event"...
	Images (2):
	- https://example.com/homepage.png
	- https://example.com/create-button.png
	```

	---

	## Cách Chuẩn Bị PDF

	### Format Hỗ Trợ

	Multimodal parser detect các format sau:

	1. Standard URLs:
	```
	Xem hình: https://example.com/image.png
	Screenshot: http://cdn.example.com/screenshot.jpg
	```

	2. Markdown Images:
	```markdown
	![Homepage](https://example.com/homepage.png)
	![Button](https://example.com/button.png)
	```

	3. HTML Images:
	```html
	<img src="https://example.com/form.png" alt="Form" />
	<img src="http://example.com/result.jpg">
	```

	4. Image Extensions:
	```
	https://example.com/pic.jpg
	https://example.com/chart.png
	https://example.com/diagram.svg
	```

	### Best Practices

	#### ✓ Tốt

	PDF Content Example:
	```
	# Hướng Dẫn Tạo Event

	## Bước 1: Mở Trang Chủ

	Truy cập vào trang chủ hệ thống tại homepage.

	![Homepage Screenshot](https://docs.example.com/images/homepage.png)

	Bạn sẽ thấy màn hình chính với menu bên trái.

	## Bước 2: Click "Tạo Event"

	Tìm và click vào nút "Tạo Event" ở góc trên phải.

	![Create Event Button](https://docs.example.com/images/create-button.png)

	## Bước 3: Điền Thông Tin

	Điền các thông tin sau vào form:
	- Tên event
	- Ngày giờ
	- Địa điểm

	Xem mẫu form: https://docs.example.com/images/event-form.png
	```

	Why good:
	- Có cấu trúc rõ ràng (headings)
	- Mỗi bước có text + hình ảnh
	- URLs rõ ràng, dễ detect
	- Context gắn chặt với hình

	#### ✗ Tránh

	```
	Xem các hình dưới đây [1] [2] [3]

	[Các hình ảnh ở cuối tài liệu]

	...

	[1] homepage.png
	[2] button.png
	[3] form.png
	```

	Why bad:
	- Images references không có URLs
	- Images tách biệt khỏi context
	- Không có full URLs (chỉ filenames)

	---

	## Ví Dụ Thực Tế

	### Tạo PDF Hướng Dẫn Multimodal

	File: `chatbot_guide_with_images.md`

	```markdown
	# Hướng Dẫn Sử Dụng ChatbotRAG

	## 1. Upload PDF

	### Bước 1: Chuẩn bị file PDF

	Đảm bảo file PDF của bạn đã sẵn sàng.

	![PDF File Icon](https://via.placeholder.com/150?text=PDF+File)

	### Bước 2: Sử dụng cURL hoặc Python

	Với cURL:

	\`\`\`bash
	curl -X POST "http://localhost:8000/upload-pdf-multimodal" \\
	-F "file=@your_file.pdf"
	\`\`\`

	![cURL Command Example](https://via.placeholder.com/400x100?text=cURL+Command)

	Với Python:

	\`\`\`python
	import requests
	# Upload code here
	\`\`\`

	### Bước 3: Verify Upload

	Kiểm tra kết quả upload:

	https://via.placeholder.com/500x300?text=Upload+Success+Message

	## 2. Chat Với Chatbot

	Sau khi upload, bạn có thể hỏi chatbot:

	![Chat Interface](https://via.placeholder.com/600x400?text=Chat+Interface)

	Ví dụ câu hỏi:
	- "Làm sao để upload PDF?"
	- "Các bước tạo event là gì?"

	![Chat Example](https://via.placeholder.com/600x300?text=Chat+Example)

	## 3. Xem Kết Quả

	Chatbot sẽ trả lời dựa trên PDF content:

	https://via.placeholder.com/600x350?text=Chat+Response+with+Images
	```

	Convert to PDF:
	```bash
	pandoc chatbot_guide_with_images.md -o chatbot_guide_with_images.pdf
	```

	Upload:
	```bash
	curl -X POST "http://localhost:8000/upload-pdf-multimodal" \
	-F "file=@chatbot_guide_with_images.pdf" \
	-F "title=ChatbotRAG Guide" \
	-F "category=user_guide"
	```

	---

	## Advanced: Custom Image Handling

	### Option 1: Local Images

	Nếu images ở local, bạn cần host chúng:

	```bash
	# Simple HTTP server
	cd /path/to/images
	python -m http.server 8080

	# Images available at:
	# http://localhost:8080/image1.png
	# http://localhost:8080/image2.png
	```

	Trong PDF, reference:
	```
	![Image](http://localhost:8080/image1.png)
	```

	### Option 2: Cloud Storage

	Upload images lên cloud (AWS S3, Cloudinary, Imgur, etc.):

	```python
	# Example: Upload to Imgur
	import requests

	def upload_to_imgur(image_path):
	client_id = 'YOUR_CLIENT_ID'
	headers = {'Authorization': f'Client-ID {client_id}'}

	with open(image_path, 'rb') as img:
	response = requests.post(
	'https://api.imgur.com/3/image',
	headers=headers,
	files={'image': img}
	)

	return response.json()['data']['link']

	# Upload images
	url1 = upload_to_imgur('screenshot1.png')
	url2 = upload_to_imgur('screenshot2.png')

	# Use URLs in PDF
	print(f"![Screenshot 1]({url1})")
	```

	### Option 3: Embed Images as Base64

	Nếu PDF có images embedded, extract chúng:

	```python
	import pypdfium2 as pdfium
	from PIL import Image
	import io
	import base64

	def extract_images_from_pdf(pdf_path):
	"""Extract embedded images from PDF"""
	pdf = pdfium.PdfDocument(pdf_path)
	images = []

	for page_num in range(len(pdf)):
	page = pdf[page_num]
	# Render page as image
	bitmap = page.render(scale=2.0)
	pil_image = bitmap.to_pil()

	# Save or convert to base64
	buffered = io.BytesIO()
	pil_image.save(buffered, format="PNG")
	img_str = base64.b64encode(buffered.getvalue()).decode()

	images.append({
	'page': page_num + 1,
	'base64': img_str,
	'url': f'data:image/png;base64,{img_str}'
	})

	return images
	```

	---

	## Troubleshooting

	### Images không được detect

	Nguyên nhân:
	- URLs không đúng format (thiếu http://)
	- URLs bị line break
	- Markdown syntax sai

	Giải pháp:
	```python
	# Test URL detection
	from multimodal_pdf_parser import MultimodalPDFParser

	parser = MultimodalPDFParser()
	test_text = """
	Xem hình: https://example.com/image.png
	![Alt](https://example.com/pic.jpg)
	"""

	urls = parser.extract_image_urls(test_text)
	print("Found URLs:", urls)
	```

	### Chatbot không return images

	Check:
	1. Verify PDF đã được index với multimodal parser:
	```bash
	curl http://localhost:8000/documents/pdf
	# Look for "type": "multimodal_pdf"
	```

	2. Check metadata có `image_urls`:
	```python
	response = requests.post('http://localhost:8000/chat', ...)
	for ctx in response.json()['context_used']:
	print(ctx['metadata'].get('image_urls', []))
	```

	### Images quá nhiều → chunks lớn

	Solution: Giảm số images mỗi chunk:

	```python
	# In multimodal_pdf_parser.py
	parser = MultimodalPDFParser(
	chunk_size=300, # Smaller chunks
	chunk_overlap=30,
	extract_images=True
	)
	```

	---

	## Kết Luận

	### Khi Nào Dùng Multimodal PDF?

	✓ Sử dụng `/upload-pdf-multimodal` khi:
	- PDF có hình ảnh minh họa (screenshots, diagrams)
	- Cần chatbot reference hình ảnh khi trả lời
	- User guides, tutorials với visual instructions
	- Documentation với charts, tables as images

	✓ Sử dụng `/upload-pdf` thường khi:
	- PDF chỉ có text thuần
	- Không cần images trong context
	- Simple documents, FAQs

	### Workflow Hoàn Chỉnh

	1. Tạo PDF với text + image URLs (Markdown/HTML)
	2. Upload qua `/upload-pdf-multimodal`
	3. Verify images đã được detect
	4. Chat - images sẽ tự động được include in context
	5. Display images trong UI của bạn

	---

	## Example: Full Workflow

	```python
	"""
	Complete workflow: Create, upload, and chat with multimodal PDF
	"""
	import requests

	# 1. Upload multimodal PDF
	print("=== Uploading Multimodal PDF ===")
	with open('user_guide_with_images.pdf', 'rb') as f:
	response = requests.post(
	'http://localhost:8000/upload-pdf-multimodal',
	files={'file': f},
	data={'title': 'User Guide', 'category': 'guide'}
	)

	result = response.json()
	print(f"✓ Indexed: {result['chunks_indexed']} chunks")
	print(f"✓ Message: {result['message']}")

	# 2. Chat with multimodal context
	print("\n=== Chatting ===")
	response = requests.post('http://localhost:8000/chat', json={
	'message': 'Làm sao để tạo event mới? Cho tôi xem hình minh họa.',
	'use_rag': True,
	'use_advanced_rag': True,
	'top_k': 3,
	'hf_token': 'your_token'
	})

	chat_result = response.json()
	print(f"Answer: {chat_result['response']}\n")

	# 3. Display context with images
	print("=== Context with Images ===")
	for i, ctx in enumerate(chat_result['context_used'], 1):
	print(f"\n[{i}] Page {ctx['metadata']['page']}, Confidence: {ctx['confidence']:.2%}")
	print(f"Text: {ctx['metadata']['text'][:150]}...")

	if ctx['metadata'].get('has_images'):
	print(f"Images ({ctx['metadata']['num_images']}):")
	for url in ctx['metadata']['image_urls']:
	print(f" 🖼️ {url}")
	```

	---

	Bây giờ PDF của bạn có hình ảnh minh họa sẽ work perfectly! 🎨📄