Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,4 +10,66 @@ pinned: false
|
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# AI-powered Web Search and PDF Chat Assistant
|
| 14 |
+
|
| 15 |
+
This project combines the power of large language models with web search capabilities and PDF document analysis to create a versatile chat assistant. Users can interact with their uploaded PDF documents or leverage web search to get informative responses to their queries.
|
| 16 |
+
|
| 17 |
+
## Features
|
| 18 |
+
|
| 19 |
+
- **PDF Document Chat**: Upload and interact with multiple PDF documents.
|
| 20 |
+
- **Web Search Integration**: Option to use web search for answering queries.
|
| 21 |
+
- **Multiple AI Models**: Choose from a selection of powerful language models.
|
| 22 |
+
- **Customizable Responses**: Adjust temperature and API call settings for fine-tuned outputs.
|
| 23 |
+
- **User-friendly Interface**: Built with Gradio for an intuitive chat experience.
|
| 24 |
+
- **Document Selection**: Choose which uploaded documents to include in your queries.
|
| 25 |
+
|
| 26 |
+
## How It Works
|
| 27 |
+
|
| 28 |
+
1. **Document Processing**:
|
| 29 |
+
- Upload PDF documents using either PyPDF or LlamaParse.
|
| 30 |
+
- Documents are processed and stored in a FAISS vector database for efficient retrieval.
|
| 31 |
+
|
| 32 |
+
2. **Embedding**:
|
| 33 |
+
- Utilizes HuggingFace embeddings (default: 'sentence-transformers/all-mpnet-base-v2') for document indexing and query matching.
|
| 34 |
+
|
| 35 |
+
3. **Query Processing**:
|
| 36 |
+
- For PDF queries, relevant document sections are retrieved from the FAISS database.
|
| 37 |
+
- For web searches, results are fetched using the DuckDuckGo search API.
|
| 38 |
+
|
| 39 |
+
4. **Response Generation**:
|
| 40 |
+
- Queries are processed using the selected AI model (options include Mistral, Mixtral, and others).
|
| 41 |
+
- Responses are generated based on the retrieved context (from PDFs or web search).
|
| 42 |
+
|
| 43 |
+
5. **User Interaction**:
|
| 44 |
+
- Users can chat with the AI, asking questions about uploaded documents or general queries.
|
| 45 |
+
- The interface allows for adjusting model parameters and switching between PDF and web search modes.
|
| 46 |
+
|
| 47 |
+
## Setup and Usage
|
| 48 |
+
|
| 49 |
+
1. Install the required dependencies (list of dependencies to be added).
|
| 50 |
+
2. Set up the necessary API keys and tokens in your environment variables.
|
| 51 |
+
3. Run the main script to launch the Gradio interface.
|
| 52 |
+
4. Upload PDF documents using the file input at the top of the interface.
|
| 53 |
+
5. Select documents to query using the checkboxes.
|
| 54 |
+
6. Toggle between PDF chat and web search modes as needed.
|
| 55 |
+
7. Adjust temperature and number of API calls to fine-tune responses.
|
| 56 |
+
8. Start chatting and asking questions!
|
| 57 |
+
|
| 58 |
+
## Models
|
| 59 |
+
|
| 60 |
+
The project supports multiple AI models, including:
|
| 61 |
+
- mistralai/Mistral-7B-Instruct-v0.3
|
| 62 |
+
- mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 63 |
+
- meta/llama-3.1-8b-instruct
|
| 64 |
+
- mistralai/Mistral-Nemo-Instruct-2407
|
| 65 |
+
|
| 66 |
+
## Future Improvements
|
| 67 |
+
|
| 68 |
+
- Integration of more embedding models for improved performance.
|
| 69 |
+
- Enhanced PDF parsing capabilities.
|
| 70 |
+
- Support for additional file formats beyond PDF.
|
| 71 |
+
- Improved caching for faster response times.
|
| 72 |
+
|
| 73 |
+
## Contribution
|
| 74 |
+
|
| 75 |
+
Contributions to this project are welcome! Please feel free to submit issues or pull requests on the project's GitHub repository.
|