Push code to Hhuggingface space repo
Browse files- README.md +96 -0
- app.py +96 -0
- requirements.txt +2 -0
README.md
CHANGED
|
@@ -11,3 +11,99 @@ short_description: AI chatbot with a crafted personality (e.g., Wise Mentor)
|
|
| 11 |
---
|
| 12 |
|
| 13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 14 |
+
|
| 15 |
+
# 🤖 Prompt-Engineered Persona Agent with Mini-RAG
|
| 16 |
+
|
| 17 |
+
This project is an agentic chatbot built with a quantized LLM (`Gemma 1B`) that behaves according to a customizable persona prompt. It features a lightweight Retrieval-Augmented Generation (RAG) system using **TF-IDF + FAISS**, and **dynamic context length estimation** to optimize inference time—perfectly suited for CPU-only environments like Hugging Face Spaces.
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## 🚀 Features
|
| 22 |
+
|
| 23 |
+
* ✅ **Customizable Persona** via system prompt
|
| 24 |
+
* ✅ **Mini-RAG** using TF-IDF + FAISS to retrieve relevant past conversation
|
| 25 |
+
* ✅ **Efficient memory** — only top relevant chat history used
|
| 26 |
+
* ✅ **Dynamic context length** estimation speeds up response time
|
| 27 |
+
* ✅ Gradio-powered UI
|
| 28 |
+
* ✅ Runs on free CPU
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## 🧠 How It Works
|
| 33 |
+
|
| 34 |
+
1. **User submits a query** along with a system persona prompt.
|
| 35 |
+
2. **Top-k similar past turns** are retrieved using FAISS over TF-IDF vectors.
|
| 36 |
+
3. Only **relevant chat history** is used to build the final prompt.
|
| 37 |
+
4. The LLM generates a response based on the combined system prompt, retrieved context, and current user message.
|
| 38 |
+
5. Context length (`n_ctx`) is dynamically estimated to minimize resource usage.
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## 🧪 Example Personas
|
| 43 |
+
|
| 44 |
+
You can change the persona in the UI system prompt box:
|
| 45 |
+
|
| 46 |
+
* 📚 `"You are a wise academic advisor who offers up to 3 concise, practical suggestions."`
|
| 47 |
+
* 🧘 `"You are a calm mindfulness coach. Always reply gently and with encouragement."`
|
| 48 |
+
* 🕵️ `"You are an investigative assistant. Be logical, skeptical, and fact-focused."`
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## 📦 Installation
|
| 53 |
+
|
| 54 |
+
**For local setup:**
|
| 55 |
+
|
| 56 |
+
```bash
|
| 57 |
+
git clone https://huggingface.co/spaces/YOUR_USERNAME/Prompt-Persona-Agent
|
| 58 |
+
cd Prompt-Persona-Agent
|
| 59 |
+
pip install -r requirements.txt
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
Create an environment variable:
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
export HF_TOKEN=your_huggingface_token
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
Then run:
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
python app.py
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## 📁 Files
|
| 77 |
+
|
| 78 |
+
* `app.py`: Main application with chat + RAG + dynamic context
|
| 79 |
+
* `requirements.txt`: All Python dependencies
|
| 80 |
+
* `README.md`: This file
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## 🛠️ Tech Stack
|
| 85 |
+
|
| 86 |
+
* [Gradio](https://gradio.app/)
|
| 87 |
+
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
| 88 |
+
* [FAISS](https://github.com/facebookresearch/faiss)
|
| 89 |
+
* [scikit-learn (TF-IDF)](https://scikit-learn.org/)
|
| 90 |
+
* [Gemma 1B IT GGUF](https://huggingface.co/google/gemma-1.1-1b-it-gguf)
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## 📌 Limitations
|
| 95 |
+
|
| 96 |
+
* Basic TF-IDF + FAISS retrieval — can be extended with semantic embedding models.
|
| 97 |
+
* Not all LLMs strictly follow persona — prompt tuning helps but is not perfect.
|
| 98 |
+
* For longer-term memory, a database + summarizer would be better.
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## 📤 Deploy to Hugging Face Spaces
|
| 103 |
+
|
| 104 |
+
> Uses only CPU, no paid GPU required.
|
| 105 |
+
|
| 106 |
+
Make sure your `HF_TOKEN` is set as a secret or environment variable in your Hugging Face Space.
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
app.py
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import gradio as gr
|
| 3 |
+
from llama_cpp import Llama
|
| 4 |
+
from huggingface_hub import snapshot_download, login
|
| 5 |
+
from sklearn.feature_extraction.text import TfidfVectorizer
|
| 6 |
+
import fiass
|
| 7 |
+
import numpy as np
|
| 8 |
+
|
| 9 |
+
#--------------------MODEL SETUP--------------------
|
| 10 |
+
MODEL_REPO = "google/gemma-3-1b-it-qat-q4_0-gguf"
|
| 11 |
+
MODEL_PATH = "./gemma-3-1b-it-qat-q4_0/gemma-3-1b-it-q4_0.gguf"
|
| 12 |
+
MODEL_DIR = "./gemma-3-1b-it-qat-q4_0"
|
| 13 |
+
DEFAULT_SYSTEM_PROMPT = (
|
| 14 |
+
"You are a Wise Mentor. Speak in a calm and concise manner. "
|
| 15 |
+
"If asked for advice, give a maximum of 3 actionable steps. "
|
| 16 |
+
"Avoid unnecessary elaboration. Decline unethical or harmful requests."
|
| 17 |
+
)
|
| 18 |
+
|
| 19 |
+
# Huggingface Token and download
|
| 20 |
+
hf_token = os.environ.get("HF_TOKEN")
|
| 21 |
+
if not os.path.exists(MODEL_PATH):
|
| 22 |
+
if not hf_token:
|
| 23 |
+
raise ValueError("HF_TOKEN is missing. Set it as an environment variable")
|
| 24 |
+
|
| 25 |
+
login(hf_token)
|
| 26 |
+
snapshot_download(repo_id=MODEL_REPO, local_dir=MODEL_DIR, local_dir_use_symlinks=False)
|
| 27 |
+
|
| 28 |
+
#--------------------RAG SETUP------------------------
|
| 29 |
+
documents = [] # stores all chat turns
|
| 30 |
+
vectorizer = TfidfVectorizer()
|
| 31 |
+
index = None
|
| 32 |
+
|
| 33 |
+
def update_rag_index():
|
| 34 |
+
global index
|
| 35 |
+
if not documents:
|
| 36 |
+
return
|
| 37 |
+
vectors = vectorizer.fit_transform(documents).toarray().asype('float32')
|
| 38 |
+
index = fiass.IndexFlatL2(vectors.shape[1])
|
| 39 |
+
index.add(vectors)
|
| 40 |
+
|
| 41 |
+
def retrive_relvant_docs(query, k=2):
|
| 42 |
+
if not documents or index is None:
|
| 43 |
+
return ""
|
| 44 |
+
|
| 45 |
+
query_vac = vectorizer.transform([query]).toarray().astype('float32')
|
| 46 |
+
D, I = index.search(query_vac, k)
|
| 47 |
+
return "\n".join(documents[i] for i in I[0] if i < len(documents))
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
#-----------------------CONTEXT LENGTH ESTIMATION---------------------
|
| 51 |
+
def estimate_n_ctx(full_prompt, buffer = 500):
|
| 52 |
+
total_tokens = len(full_prompt.split())
|
| 53 |
+
return min(3500, total_tokens+buffer)
|
| 54 |
+
|
| 55 |
+
#-----------------------CHAT FUNCTION-----------------------
|
| 56 |
+
def chat(user_input, history, system_prompt):
|
| 57 |
+
relevent_context = retrive_relvant_docs(user_input)
|
| 58 |
+
formatted_turns = "".join([f"<user>{u}</user><bot>{b}</bot>" for u, b in relevent_context])
|
| 59 |
+
|
| 60 |
+
full_prompt = (
|
| 61 |
+
f"<s>[INST] <<SYS>>\n{system_prompt}\nContext:\n{relevent_context}\n<</SYS>>\n"
|
| 62 |
+
f"{formatted_turns}<user>{user_input}[/INST]"
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
# Dynamic estimate n_ctx
|
| 66 |
+
n_ctx = estimate_n_ctx(full_prompt=full_prompt)
|
| 67 |
+
|
| 68 |
+
llm = Llama(
|
| 69 |
+
model_path= MODEL_PATH,
|
| 70 |
+
n_ctx = n_ctx,
|
| 71 |
+
n_threads=2,
|
| 72 |
+
n_batch=128
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
+
output = llm(full_prompt, max_tokens=256, stop=["</s>", "<user>"])
|
| 76 |
+
bot_reply = output["choices"][0]["text"].strip()
|
| 77 |
+
|
| 78 |
+
documents.append(f"user: {user_input} bot: {bot_reply}")
|
| 79 |
+
update_rag_index()
|
| 80 |
+
|
| 81 |
+
history.append((user_input, bot_reply))
|
| 82 |
+
return "", history
|
| 83 |
+
|
| 84 |
+
#-----------------------UI---------------------
|
| 85 |
+
with gr.Blocks() as demo:
|
| 86 |
+
gr.Markdown("# 🤖 Persona Agent with Mini-RAG + Dynamic Context")
|
| 87 |
+
with gr.Row():
|
| 88 |
+
system_prompt_box = gr.Textbox(label="System Prompt", value=DEFAULT_SYSTEM_PROMPT, lines=3)
|
| 89 |
+
chatbot = gr.Chatbot()
|
| 90 |
+
msg = gr.Textbox(label="Your Message")
|
| 91 |
+
clear = gr.Button("🗑️ Clear")
|
| 92 |
+
|
| 93 |
+
msg.submit(chat, [msg, chatbot, system_prompt_box], [msg, chatbot])
|
| 94 |
+
clear.click(lambda: [], None, chatbot)
|
| 95 |
+
|
| 96 |
+
demo.launch()
|
requirements.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio
|
| 2 |
+
llama-cpp-python
|