Spaces:

edgemate
/

edgellm

Runtime error

App Files Files Community

edgellm / RAG_SETUP.md

wu981526092

� MAJOR: Add RAG System with LangChain - Document upload and retrieval-augmented generation support

fc80207 3 months ago

preview code

raw

history blame

4.1 kB

RAG System Setup Guide

Overview

The Edge LLM platform now includes a simple RAG (Retrieval-Augmented Generation) system that allows you to upload documents to enhance AI responses with relevant context.

Features

📁 Document Upload: Support for PDF, TXT, DOCX, and MD files
🔍 Semantic Search: Find relevant information from your documents
⚙️ Configurable Retrieval: Adjust how many document chunks to use for context
🎯 Easy Integration: Toggle RAG on/off in the Assistant Studio

Installation

Backend Dependencies

Install the required Python packages:

pip install -r requirements.txt

The RAG system requires these additional packages:

langchain: LangChain framework
pypdf: PDF processing
python-docx: Word document processing
faiss-cpu: Vector similarity search
sentence-transformers: Text embeddings
unstructured: Document parsing

Frontend

No additional frontend dependencies needed. The Documents tab is included in the main build.

Usage

1. Access the Documents Tab

Open Assistant Studio
Navigate to the Documents tab (next to Parameters and Instructions)

2. Upload Documents

Click "Click to upload documents" in the upload area
Select PDF, TXT, DOCX, or MD files
Files will be processed and chunked automatically
Uploaded documents appear in the "Uploaded Documents" section

3. Configure RAG

Enable RAG: Toggle the "Enable RAG" switch (only available when documents are uploaded)
Retrieval Count: Adjust the slider to set how many document chunks to retrieve (1-10)
- 1-3: Focused responses with minimal context
- 4-7: Balanced responses with moderate context
- 8-10: Comprehensive responses with extensive context

4. Chat with RAG Enhancement

Once RAG is enabled:

Ask questions normally in the chat
The system will automatically search your uploaded documents
Relevant information will be added to the AI's context
The AI will incorporate document information into responses when relevant

API Endpoints

Document Management

POST /rag/upload - Upload multiple documents
GET /rag/documents - List uploaded documents
DELETE /rag/documents/{doc_id} - Delete a document
POST /rag/search - Search through documents

Enhanced Generation

The existing /generate endpoint now supports RAG when:

Documents are uploaded to the RAG system
The request includes RAG configuration (handled automatically by frontend)

Technical Details

Document Processing

Files are uploaded and temporarily stored
LangChain loaders extract text content
Text is split into chunks (1000 chars with 200 char overlap)
Chunks are embedded using sentence-transformers/all-MiniLM-L6-v2
Embeddings are stored in FAISS vector database

RAG Pipeline

User query is embedded using the same model
Similarity search finds relevant document chunks
Retrieved chunks are added to the system prompt
AI generates response with document context

Limitations & Notes

Memory Storage: Documents are stored in memory (not persistent across restarts)
CPU Only: Uses CPU-based embeddings for compatibility
File Size: Large files may take time to process
Language: Optimized for English content

Troubleshooting

"RAG system not available" Error

Ensure LangChain dependencies are installed
Check that rag_system.py is in the correct location
Verify embeddings model downloaded successfully

Documents Not Uploading

Check file format (PDF, TXT, DOCX, MD supported)
Ensure file size is reasonable (<50MB recommended)
Check browser console for error messages

Poor RAG Performance

Try adjusting retrieval count
Ensure documents contain relevant information
Check that document text was extracted correctly

Future Improvements

Persistent vector storage (ChromaDB, Pinecone)
GPU acceleration for embeddings
More document formats (PPT, HTML, etc.)
Advanced chunking strategies
Custom embedding models
Query expansion and reranking