� MAJOR: Add RAG System with LangChain - Document upload and retrieval-augmented generation support
fc80207
RAG System Setup Guide
Overview
The Edge LLM platform now includes a simple RAG (Retrieval-Augmented Generation) system that allows you to upload documents to enhance AI responses with relevant context.
Features
- 📁 Document Upload: Support for PDF, TXT, DOCX, and MD files
- 🔍 Semantic Search: Find relevant information from your documents
- ⚙️ Configurable Retrieval: Adjust how many document chunks to use for context
- 🎯 Easy Integration: Toggle RAG on/off in the Assistant Studio
Installation
Backend Dependencies
Install the required Python packages:
pip install -r requirements.txt
The RAG system requires these additional packages:
langchain: LangChain frameworkpypdf: PDF processingpython-docx: Word document processingfaiss-cpu: Vector similarity searchsentence-transformers: Text embeddingsunstructured: Document parsing
Frontend
No additional frontend dependencies needed. The Documents tab is included in the main build.
Usage
1. Access the Documents Tab
- Open Assistant Studio
- Navigate to the Documents tab (next to Parameters and Instructions)
2. Upload Documents
- Click "Click to upload documents" in the upload area
- Select PDF, TXT, DOCX, or MD files
- Files will be processed and chunked automatically
- Uploaded documents appear in the "Uploaded Documents" section
3. Configure RAG
- Enable RAG: Toggle the "Enable RAG" switch (only available when documents are uploaded)
- Retrieval Count: Adjust the slider to set how many document chunks to retrieve (1-10)
- 1-3: Focused responses with minimal context
- 4-7: Balanced responses with moderate context
- 8-10: Comprehensive responses with extensive context
4. Chat with RAG Enhancement
Once RAG is enabled:
- Ask questions normally in the chat
- The system will automatically search your uploaded documents
- Relevant information will be added to the AI's context
- The AI will incorporate document information into responses when relevant
API Endpoints
Document Management
POST /rag/upload- Upload multiple documentsGET /rag/documents- List uploaded documentsDELETE /rag/documents/{doc_id}- Delete a documentPOST /rag/search- Search through documents
Enhanced Generation
The existing /generate endpoint now supports RAG when:
- Documents are uploaded to the RAG system
- The request includes RAG configuration (handled automatically by frontend)
Technical Details
Document Processing
- Files are uploaded and temporarily stored
- LangChain loaders extract text content
- Text is split into chunks (1000 chars with 200 char overlap)
- Chunks are embedded using
sentence-transformers/all-MiniLM-L6-v2 - Embeddings are stored in FAISS vector database
RAG Pipeline
- User query is embedded using the same model
- Similarity search finds relevant document chunks
- Retrieved chunks are added to the system prompt
- AI generates response with document context
Limitations & Notes
- Memory Storage: Documents are stored in memory (not persistent across restarts)
- CPU Only: Uses CPU-based embeddings for compatibility
- File Size: Large files may take time to process
- Language: Optimized for English content
Troubleshooting
"RAG system not available" Error
- Ensure LangChain dependencies are installed
- Check that
rag_system.pyis in the correct location - Verify embeddings model downloaded successfully
Documents Not Uploading
- Check file format (PDF, TXT, DOCX, MD supported)
- Ensure file size is reasonable (<50MB recommended)
- Check browser console for error messages
Poor RAG Performance
- Try adjusting retrieval count
- Ensure documents contain relevant information
- Check that document text was extracted correctly
Future Improvements
- Persistent vector storage (ChromaDB, Pinecone)
- GPU acceleration for embeddings
- More document formats (PPT, HTML, etc.)
- Advanced chunking strategies
- Custom embedding models
- Query expansion and reranking