graphRAG / README.md
nvtitan's picture
Update README.md
0f36d1c verified
---
title: GraphLLM - PDF Knowledge Graph RAG
emoji: πŸ•ΈοΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# πŸ•ΈοΈ GraphLLM - PDF Knowledge Graph + RAG System
Transform PDFs into interactive knowledge graphs with AI-powered Q&A.
## πŸš€ Features
- **πŸ“„ PDF Processing:** Extract text, tables, and images from PDFs
- **πŸ•ΈοΈ Knowledge Graph Generation:** Build semantic graphs using Gemini AI
- **πŸ” Vector Search:** FAISS-powered semantic search with sentence transformers
- **πŸ’¬ RAG Chat:** Ask questions and get answers with source citations
- **🎨 Interactive Visualization:** Explore knowledge graphs in your browser
## πŸ› οΈ Technology Stack
- **LLM:** Google Gemini (gemini-2.5-flash)
- **Embeddings:** sentence-transformers/all-MiniLM-L6-v2
- **Vector Store:** FAISS with HNSW index
- **Graph:** NetworkX (in-memory)
- **Backend:** FastAPI + Uvicorn
- **Frontend:** Vanilla JS with D3.js/Cytoscape
## πŸ“‹ Setup
### Required: Gemini API Key
This app requires a Google Gemini API key:
1. Get your API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Add it as a **Secret** in Hugging Face Spaces settings:
- Name: `GEMINI_API_KEY`
- Value: Your API key
### Configuration (Optional)
You can set these environment variables in Space Settings:
```bash
# LLM Settings
GEMINI_MODEL=gemini-2.5-flash # Gemini model
LLM_TEMPERATURE=0.0 # Temperature for extraction
# Embedding Settings
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
# Environment
ENVIRONMENT=production
LOG_LEVEL=INFO
```
## 🎯 Usage
1. **Upload PDF:** Click "Upload PDF" and select your document
2. **Wait for Processing:** The system will:
- Extract text chunks
- Generate embeddings
- Build knowledge graph with Gemini
3. **Explore Graph:** Click nodes to see details and related concepts
4. **Ask Questions:** Use the chat interface for Q&A with citations
## πŸ“Š Graph Generation
- **Per-Page Extraction:** Max 2 concepts per page (quality over quantity)
- **Parallel Processing:** All pages processed concurrently via Gemini API
- **Strict Filtering:** Only technical/domain-specific concepts
- **Co-occurrence Relationships:** Concepts on same page are linked
## 🎨 Frontend
The frontend is a single-page application located in `/frontend/`:
- `index.html` - Main UI
- `app.js` - Graph visualization & API calls
- `styles.css` - Styling
Access it at: `http://your-space-url.hf.space/frontend/`
## πŸ“¦ Docker
This Space uses Docker for deployment:
- Base: `python:3.12-slim`
- Port: 7860 (HF Spaces default)
- Health check enabled
- Persistent data directory
## 🀝 Credits
- **LLM:** Google Gemini
- **Embeddings:** Hugging Face sentence-transformers
---