edgellm / RAG_SETUP.md
wu981526092's picture
� MAJOR: Add RAG System with LangChain - Document upload and retrieval-augmented generation support
fc80207
|
raw
history blame
4.1 kB

RAG System Setup Guide

Overview

The Edge LLM platform now includes a simple RAG (Retrieval-Augmented Generation) system that allows you to upload documents to enhance AI responses with relevant context.

Features

  • 📁 Document Upload: Support for PDF, TXT, DOCX, and MD files
  • 🔍 Semantic Search: Find relevant information from your documents
  • ⚙️ Configurable Retrieval: Adjust how many document chunks to use for context
  • 🎯 Easy Integration: Toggle RAG on/off in the Assistant Studio

Installation

Backend Dependencies

Install the required Python packages:

pip install -r requirements.txt

The RAG system requires these additional packages:

  • langchain: LangChain framework
  • pypdf: PDF processing
  • python-docx: Word document processing
  • faiss-cpu: Vector similarity search
  • sentence-transformers: Text embeddings
  • unstructured: Document parsing

Frontend

No additional frontend dependencies needed. The Documents tab is included in the main build.

Usage

1. Access the Documents Tab

  1. Open Assistant Studio
  2. Navigate to the Documents tab (next to Parameters and Instructions)

2. Upload Documents

  1. Click "Click to upload documents" in the upload area
  2. Select PDF, TXT, DOCX, or MD files
  3. Files will be processed and chunked automatically
  4. Uploaded documents appear in the "Uploaded Documents" section

3. Configure RAG

  1. Enable RAG: Toggle the "Enable RAG" switch (only available when documents are uploaded)
  2. Retrieval Count: Adjust the slider to set how many document chunks to retrieve (1-10)
    • 1-3: Focused responses with minimal context
    • 4-7: Balanced responses with moderate context
    • 8-10: Comprehensive responses with extensive context

4. Chat with RAG Enhancement

Once RAG is enabled:

  1. Ask questions normally in the chat
  2. The system will automatically search your uploaded documents
  3. Relevant information will be added to the AI's context
  4. The AI will incorporate document information into responses when relevant

API Endpoints

Document Management

  • POST /rag/upload - Upload multiple documents
  • GET /rag/documents - List uploaded documents
  • DELETE /rag/documents/{doc_id} - Delete a document
  • POST /rag/search - Search through documents

Enhanced Generation

The existing /generate endpoint now supports RAG when:

  • Documents are uploaded to the RAG system
  • The request includes RAG configuration (handled automatically by frontend)

Technical Details

Document Processing

  1. Files are uploaded and temporarily stored
  2. LangChain loaders extract text content
  3. Text is split into chunks (1000 chars with 200 char overlap)
  4. Chunks are embedded using sentence-transformers/all-MiniLM-L6-v2
  5. Embeddings are stored in FAISS vector database

RAG Pipeline

  1. User query is embedded using the same model
  2. Similarity search finds relevant document chunks
  3. Retrieved chunks are added to the system prompt
  4. AI generates response with document context

Limitations & Notes

  • Memory Storage: Documents are stored in memory (not persistent across restarts)
  • CPU Only: Uses CPU-based embeddings for compatibility
  • File Size: Large files may take time to process
  • Language: Optimized for English content

Troubleshooting

"RAG system not available" Error

  • Ensure LangChain dependencies are installed
  • Check that rag_system.py is in the correct location
  • Verify embeddings model downloaded successfully

Documents Not Uploading

  • Check file format (PDF, TXT, DOCX, MD supported)
  • Ensure file size is reasonable (<50MB recommended)
  • Check browser console for error messages

Poor RAG Performance

  • Try adjusting retrieval count
  • Ensure documents contain relevant information
  • Check that document text was extracted correctly

Future Improvements

  • Persistent vector storage (ChromaDB, Pinecone)
  • GPU acceleration for embeddings
  • More document formats (PPT, HTML, etc.)
  • Advanced chunking strategies
  • Custom embedding models
  • Query expansion and reranking