Spaces:

BonelliLab
/

Eidolon-CognitiveTutor

Sleeping

App Files Files Community

BonelliLab commited on 11 days ago

Commit

f37a598

1 Parent(s): 714138b

Apply production improvements from demo branch: inference adapter, history, UI, rate limiting, tests, docs

Browse files

Files changed (5) hide show

README.md +206 -39
api/ask.py +159 -0
api/history.py +60 -0
public/index.html +200 -0
tests/test_api.py +80 -0

README.md CHANGED Viewed

@@ -33,57 +33,224 @@ A simple implementation of a cognitive language model using Qwen3-7B-Instruct fr
    pip install -r requirements.txt
    ```
-## Usage
-1. Run the interactive CLI:
-   ```bash
-   python cognitive_llm.py
-   ```
-2. Enter your prompt when prompted with `>>` and press Enter
-3. Type 'quit' or 'exit' to exit the program
-### Example Usage
-```python
-from cognitive_llm import CognitiveLLM
-# Initialize the LLM
-llm = CognitiveLLM()
-# Generate text
-response = llm.generate(
-    "Explain quantum computing in simple terms.",
-    max_new_tokens=256,
-    temperature=0.7
-)
-print(response)
 ```
-## Configuration
-You can customize the model and generation parameters:
-```python
-llm = CognitiveLLM(
-    model_name="Qwen/Qwen3-7B-Instruct",  # Model name or path
-    device="cuda"  # 'cuda', 'mps', or 'cpu'
-)
-# Generate with custom parameters
-response = llm.generate(
-    "Your prompt here",
-    max_new_tokens=512,
-    temperature=0.7,
-    top_p=0.9,
-    do_sample=True
-)
 ```
-## Note
-- First run will download the model weights (several GB)
-- A CUDA-compatible GPU is recommended for reasonable performance
-- Ensure you have sufficient disk space for the model weights
-- Internet connection is required for the initial download

    pip install -r requirements.txt
    ```
+---
+title: Eidolon
+---
+# Eidolon — Interactive Tutor Demo
+Production-ready demo application: a static frontend with a serverless API that accepts prompts and returns adaptive responses. Built for easy deployment to Vercel or Hugging Face Spaces with optional inference backend integration.
+## ✨ Features
+- **Demo Mode**: Safe, deterministic responses for public demos (no API keys or model hosting required)
+- **External Inference**: Plug in any hosted inference API (Hugging Face, Replicate, custom endpoints)
+- **Conversation History**: SQLite-backed session storage with history retrieval
+- **Rate Limiting**: Configurable IP-based rate limiting to prevent abuse
+- **Modern UI**: Interactive interface with example prompts, copy buttons, and loading states
+- **Retry Logic**: Automatic retries with exponential backoff for inference calls
+- **CORS Support**: Cross-origin requests enabled for flexible deployment
+## Quick Start (Demo Mode)
+Run the demo locally without any external services:
+```powershell
+# Install lightweight dependencies
+pip install -r dev-requirements.txt
+# Start demo (PowerShell)
+.\scripts\run_demo.ps1
+# Or manually
+$env:DEMO_MODE = "1"
+python app.py
+```
+Visit the Gradio URL shown in the terminal (usually http://localhost:7860).
+## Project Structure
 ```
+├── api/
+│   ├── ask.py          # FastAPI serverless endpoint (main API)
+│   └── history.py      # Conversation history storage (SQLite)
+├── public/
+│   ├── index.html      # Static demo UI
+│   └── assets/         # UI assets (screenshot, etc.)
+├── tests/
+│   └── test_api.py     # API tests
+├── scripts/
+│   └── run_demo.ps1    # Quick demo launcher
+├── app.py              # Gradio UI (optional local interface)
+├── dev-requirements.txt # Lightweight dependencies (FastAPI, pytest, etc.)
+├── vercel.json         # Vercel deployment config
+└── README.md
+```
+## Environment Variables
+### Core Settings
+| Variable | Description | Default | Required |
+|----------|-------------|---------|----------|
+| `DEMO_MODE` | Enable demo responses (no external services) | `0` | No |
+| `INFERENCE_API_URL` | URL of hosted inference endpoint | - | No (required for real inference) |
+| `INFERENCE_API_KEY` | Bearer token for inference API | - | No |
+### Rate Limiting
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `RATE_LIMIT_REQUESTS` | Max requests per window | `10` |
+| `RATE_LIMIT_WINDOW` | Window size in seconds | `60` |
+### Storage
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `HISTORY_DB_PATH` | SQLite database path | `conversation_history.db` |
+## Deployment
+### Vercel (Recommended)
+1. Set environment variables in Vercel project settings:
+   - `DEMO_MODE=1` (for public demo)
+   - Or `INFERENCE_API_URL` + `INFERENCE_API_KEY` (for real inference)
+2. Deploy:
+```powershell
+vercel --prod
 ```
+The `vercel.json` config automatically serves `public/` as static files and `api/*.py` as Python serverless functions.
+### Hugging Face Spaces
+1. In Space Settings:
+   - Set Branch to `demo`
+   - Add environment variable: `DEMO_MODE` = `1`
+   - Restart the Space
+2. Or use the `main` branch with `INFERENCE_API_URL` configured to call a hosted model.
+### One-Click Deploy
+[![Deploy to Vercel](https://vercel.com/button)](https://vercel.com/new/git/external?repository-url=https://github.com/Zwin-ux/Eidolon-Cognitive-Tutor)
+## API Reference
+### POST `/api/ask`
+Request body:
+```json
+{
+  "prompt": "Your question here",
+  "max_tokens": 512,
+  "temperature": 0.7,
+  "session_id": "optional-session-id"
+}
+```
+Response:
+```json
+{
+  "result": "Response text",
+  "source": "demo",
+  "session_id": "generated-or-provided-session-id"
+}
+```
+### GET `/api/history/{session_id}`
+Retrieve conversation history for a session.
+Response:
+```json
+{
+  "session_id": "...",
+  "history": [
+    {
+      "prompt": "...",
+      "response": "...",
+      "source": "demo",
+      "timestamp": "2025-11-06 12:34:56"
+    }
+  ]
+}
+```
+## Testing
+Run the test suite:
+```powershell
+pip install -r dev-requirements.txt
+pytest -v
+```
+CI is configured via `.github/workflows/ci.yml` and runs automatically on push/PR.
+## Development
+### Running with a Real Inference Backend
+Set environment variables and run:
+```powershell
+$env:INFERENCE_API_URL = "https://api-inference.huggingface.co/models/your-org/your-model"
+$env:INFERENCE_API_KEY = "hf_..."
+python app.py
+```
+The API will automatically retry failed requests and fall back to demo mode if the backend is unavailable.
+### Conversation History
+History is stored in SQLite (`conversation_history.db` by default). The UI includes a "View History" button that loads past conversations for the current session.
+## Production Recommendations
+- **Inference Backend**: Use a hosted service (Hugging Face Inference Endpoints, Replicate, or self-hosted container) rather than loading models in serverless functions.
+- **Rate Limiting**: Adjust `RATE_LIMIT_REQUESTS` and `RATE_LIMIT_WINDOW` based on your traffic expectations.
+- **Caching**: Consider adding Redis or similar for distributed rate limiting in multi-instance deployments.
+- **Authentication**: Add API key authentication for production usage (not included in demo).
+- **Monitoring**: Set up logging and error tracking (Sentry, Datadog, etc.).
+## Current Stage
+**Demo-ready for public presentation.** Key milestones:
+- ✅ Demo mode with safe, deterministic responses
+- ✅ External inference adapter with retries
+- ✅ Conversation history storage
+- ✅ Rate limiting
+- ✅ Modern, interactive UI
+- ✅ CI/CD with tests and linting
+- ✅ One-click deployment options
+## Troubleshooting
+### "Repository Not Found" error on Hugging Face Spaces
+- **Cause**: The Space is trying to load a model at startup (e.g., `Qwen/Qwen3-7B-Instruct`) but the model is gated, private, or doesn't exist.
+- **Fix**: Set `DEMO_MODE=1` in Space environment variables and restart, or switch the Space to use the `demo` branch.
+### Rate limit errors in testing
+- **Cause**: Default rate limit is 10 requests per 60 seconds.
+- **Fix**: Set `RATE_LIMIT_REQUESTS=100` or higher when running local tests.
+### Conversation history not persisting
+- **Cause**: SQLite database may not be writable in some serverless environments.
+- **Fix**: Set `HISTORY_DB_PATH` to a writable location or use an external database (Postgres, etc.) for production.
+## Contributing
+Issues and PRs welcome at https://github.com/Zwin-ux/Eidolon-Cognitive-Tutor
+## License
+Apache 2.0

api/ask.py ADDED Viewed

	@@ -0,0 +1,159 @@

+from fastapi import FastAPI, Request, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+import os
+import httpx
+import time
+import uuid
+from collections import defaultdict
+from typing import Optional
+from .history import save_conversation, get_conversation_history
+app = FastAPI(title="Eidolon Tutor API", version="0.2.0")
+# CORS for local development and cross-origin requests
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Simple in-memory rate limiter (IP-based)
+_rate_limit_store = defaultdict(list)
+RATE_LIMIT_REQUESTS = int(os.getenv("RATE_LIMIT_REQUESTS", "10"))
+RATE_LIMIT_WINDOW = int(os.getenv("RATE_LIMIT_WINDOW", "60"))  # seconds
+def check_rate_limit(client_ip: str) -> bool:
+    """Simple sliding window rate limiter."""
+    now = time.time()
+    window_start = now - RATE_LIMIT_WINDOW
+    # Clean old requests
+    _rate_limit_store[client_ip] = [
+        req_time for req_time in _rate_limit_store[client_ip] if req_time > window_start
+    ]
+    if len(_rate_limit_store[client_ip]) >= RATE_LIMIT_REQUESTS:
+        return False
+    _rate_limit_store[client_ip].append(now)
+    return True
+class AskIn(BaseModel):
+    prompt: str
+    max_tokens: Optional[int] = 512
+    temperature: Optional[float] = 0.7
+    session_id: Optional[str] = None  # for conversation history
+class AskOut(BaseModel):
+    result: Optional[str] = None
+    error: Optional[str] = None
+    source: str = "demo"  # "demo", "inference", or "error"
+    session_id: str = ""  # returned session ID
+def get_demo_response(prompt: str) -> str:
+    """Generate deterministic demo responses."""
+    p = prompt.strip().lower()
+    if not p:
+        return "Please enter a question for the demo tutor."
+    if "explain" in p or "what is" in p:
+        return f"**Demo Explanation:**\n\nHere's a concise explanation for your question: *\"{prompt}\"*.\n\n[Demo mode active. Configure `INFERENCE_API_URL` to use a real model.]"
+    if "code" in p or "how to" in p or "implement" in p:
+        return f"**Demo Steps:**\n\n1. Understand the problem: *\"{prompt}\"*\n2. Break it down into smaller steps\n3. Implement and test\n4. Iterate and refine\n\n[Demo-mode response]"
+    if "compare" in p or "difference" in p:
+        return f"**Demo Comparison:**\n\nKey differences related to *\"{prompt}\"*:\n- Point A vs Point B\n- Tradeoffs and use cases\n\n[Demo mode]"
+    # Generic fallback
+    return f"**Demo Response:**\n\nI understood your prompt: *\"{prompt}\"*.\n\nThis is a demo response showing how the tutor would reply. Set `INFERENCE_API_URL` to enable real model inference."
+async def call_inference_api(
+    prompt: str, api_url: str, api_key: Optional[str], max_tokens: int, temperature: float
+) -> dict:
+    """Call external inference API with retries and timeout."""
+    payload = {
+        "inputs": prompt,
+        "parameters": {"max_new_tokens": max_tokens, "temperature": temperature},
+    }
+    headers = {"Accept": "application/json", "Content-Type": "application/json"}
+    if api_key:
+        headers["Authorization"] = f"Bearer {api_key}"
+    # Retry logic: 2 attempts with exponential backoff
+    for attempt in range(2):
+        try:
+            async with httpx.AsyncClient(timeout=60.0) as client:
+                resp = await client.post(api_url, json=payload, headers=headers)
+                resp.raise_for_status()
+                data = resp.json()
+                # Normalize response
+                if isinstance(data, dict) and "error" in data:
+                    return {"error": data.get("error"), "source": "inference"}
+                if isinstance(data, list) and len(data) > 0:
+                    first = data[0]
+                    if isinstance(first, dict) and "generated_text" in first:
+                        return {"result": first["generated_text"], "source": "inference"}
+                    if isinstance(first, str):
+                        return {"result": first, "source": "inference"}
+                if isinstance(data, dict) and "generated_text" in data:
+                    return {"result": data["generated_text"], "source": "inference"}
+                return {"result": str(data), "source": "inference"}
+        except httpx.HTTPError as e:
+            if attempt == 0:
+                await httpx.AsyncClient().aclose()
+                time.sleep(1)  # backoff
+                continue
+            return {"error": f"Inference API failed after retries: {str(e)}", "source": "error"}
+    return {"error": "Inference API failed", "source": "error"}
+@app.post("/", response_model=AskOut)
+async def ask(in_data: AskIn, request: Request):
+    """
+    Main API endpoint: accepts a prompt and returns a response.
+    Supports:
+    - Demo mode (DEMO_MODE=1): returns canned responses
+    - External inference (INFERENCE_API_URL set): calls hosted model
+    - Rate limiting (configurable via RATE_LIMIT_REQUESTS/RATE_LIMIT_WINDOW)
+    - Conversation history (optional session_id)
+    """
+    # Rate limiting
+    client_ip = request.client.host if request.client else "unknown"
+    if not check_rate_limit(client_ip):
+        raise HTTPException(status_code=429, detail="Rate limit exceeded. Try again later.")
+    # Generate or use provided session ID
+    session_id = in_data.session_id or str(uuid.uuid4())
+    api_url = os.environ.get("INFERENCE_API_URL")
+    api_key = os.environ.get("INFERENCE_API_KEY")
+    demo_mode = os.environ.get("DEMO_MODE", "0").lower() in ("1", "true", "yes")
+    # Demo mode
+    if demo_mode or not api_url:
+        result_text = get_demo_response(in_data.prompt)
+        save_conversation(session_id, in_data.prompt, result_text, "demo")
+        return AskOut(result=result_text, source="demo", session_id=session_id)
+    # Call inference API
+    result = await call_inference_api(
+        in_data.prompt, api_url, api_key, in_data.max_tokens, in_data.temperature
+    )
+    # Save to history
+    if result.get("result"):
+        save_conversation(session_id, in_data.prompt, result["result"], result.get("source", "inference"))
+    return AskOut(**result, session_id=session_id)
+@app.get("/history/{session_id}")
+async def get_history(session_id: str, limit: int = 10):
+    """Retrieve conversation history for a session."""
+    return {"session_id": session_id, "history": get_conversation_history(session_id, limit)}

api/history.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Simple conversation history storage using SQLite."""
+import sqlite3
+import os
+from datetime import datetime
+from typing import List, Dict, Optional
+DB_PATH = os.getenv("HISTORY_DB_PATH", "conversation_history.db")
+def init_db():
+    """Initialize the conversation history database."""
+    conn = sqlite3.connect(DB_PATH)
+    cursor = conn.cursor()
+    cursor.execute("""
+        CREATE TABLE IF NOT EXISTS conversations (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            session_id TEXT NOT NULL,
+            prompt TEXT NOT NULL,
+            response TEXT NOT NULL,
+            source TEXT DEFAULT 'demo',
+            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
+        )
+    """)
+    conn.commit()
+    conn.close()
+def save_conversation(session_id: str, prompt: str, response: str, source: str = "demo"):
+    """Save a conversation turn to the database."""
+    init_db()
+    conn = sqlite3.connect(DB_PATH)
+    cursor = conn.cursor()
+    cursor.execute(
+        "INSERT INTO conversations (session_id, prompt, response, source) VALUES (?, ?, ?, ?)",
+        (session_id, prompt, response, source),
+    )
+    conn.commit()
+    conn.close()
+def get_conversation_history(session_id: str, limit: int = 10) -> List[Dict]:
+    """Retrieve conversation history for a session."""
+    init_db()
+    conn = sqlite3.connect(DB_PATH)
+    cursor = conn.cursor()
+    cursor.execute(
+        "SELECT prompt, response, source, timestamp FROM conversations WHERE session_id = ? ORDER BY timestamp DESC LIMIT ?",
+        (session_id, limit),
+    )
+    rows = cursor.fetchall()
+    conn.close()
+    return [
+        {
+            "prompt": row[0],
+            "response": row[1],
+            "source": row[2],
+            "timestamp": row[3],
+        }
+        for row in reversed(rows)
+    ]

public/index.html ADDED Viewed

	@@ -0,0 +1,200 @@

+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width,initial-scale=1" />
+    <title>Eidolon Cognitive Tutor</title>
+    <style>
+      * { box-sizing: border-box; }
+      body { font-family: Inter, system-ui, -apple-system, 'Segoe UI', Roboto, sans-serif; background:#f7f8fb; color:#0f1724; display:flex; align-items:center; justify-content:center; min-height:100vh; margin:0; padding:20px }
+      .card { background:#fff; padding:32px; border-radius:16px; box-shadow:0 6px 40px rgba(20,20,40,0.1); width:820px; max-width:100% }
+      h1 { margin:0 0 8px 0; font-size:24px; font-weight:700 }
+      p.lead { margin:0 0 24px 0; color:#475569; line-height:1.5 }
+      .examples { display:flex; gap:8px; flex-wrap:wrap; margin-bottom:16px }
+      .example-btn { background:#e0e7ff; color:#3730a3; border:none; padding:8px 14px; border-radius:8px; font-size:13px; cursor:pointer; transition:all 0.2s }
+      .example-btn:hover { background:#c7d2fe; transform:translateY(-1px) }
+      textarea { width:100%; height:140px; padding:14px; border-radius:10px; border:1px solid #e6e9ef; resize:vertical; font-size:15px; font-family:inherit; transition:border 0.2s }
+      textarea:focus { outline:none; border-color:#0b84ff }
+      .controls { display:flex; gap:10px; margin-top:14px; align-items:center }
+      button.primary { background:#0b84ff; color:white; border:none; padding:12px 20px; border-radius:10px; font-weight:600; cursor:pointer; font-size:15px; transition:all 0.2s }
+      button.primary:hover { background:#0070e0; transform:translateY(-1px); box-shadow:0 4px 12px rgba(11,132,255,0.3) }
+      button.primary:disabled { background:#cbd5e1; cursor:not-allowed; transform:none }
+      button.secondary { background:#f1f5f9; color:#334155; border:none; padding:10px 16px; border-radius:8px; cursor:pointer; font-size:14px; transition:all 0.2s }
+      button.secondary:hover { background:#e2e8f0 }
+      .out { margin-top:20px; padding:16px; border-radius:10px; background:#f8fafc; min-height:100px; white-space:pre-wrap; border:1px solid #e2e8f0; position:relative }
+      .out.loading { background:#fef3c7; border-color:#fde047 }
+      .out.error { background:#fee; border-color:#fca5a5 }
+      .spinner { display:inline-block; width:16px; height:16px; border:2px solid #cbd5e1; border-top-color:#0b84ff; border-radius:50%; animation:spin 0.7s linear infinite; margin-right:8px }
+      @keyframes spin { to { transform: rotate(360deg) } }
+      .copy-btn { position:absolute; top:12px; right:12px; background:#fff; border:1px solid #e2e8f0; padding:6px 12px; border-radius:6px; font-size:12px; cursor:pointer; transition:all 0.2s }
+      .copy-btn:hover { background:#f1f5f9; border-color:#cbd5e1 }
+      .history { margin-top:24px; padding-top:24px; border-top:1px solid #e2e8f0 }
+      .history h3 { margin:0 0 12px 0; font-size:16px; color:#64748b }
+      .history-item { background:#f8fafc; padding:10px; border-radius:8px; margin-bottom:8px; font-size:13px; border-left:3px solid #0b84ff }
+      .meta { font-size:13px; color:#64748b; margin-top:8px }
+    </style>
+  </head>
+  <body>
+    <div class="card">
+      <h1>🧠 Eidolon Cognitive Tutor</h1>
+      <p class="lead">Interactive tutor demo powered by adaptive responses. Try the examples below or ask your own question.</p>
+      <div class="examples">
+        <button class="example-btn" data-prompt="Explain Newton's laws in simple terms">📐 Newton's Laws</button>
+        <button class="example-btn" data-prompt="How do I implement a binary search in Python?">💻 Binary Search</button>
+        <button class="example-btn" data-prompt="Compare supervised vs unsupervised learning">🤖 ML Comparison</button>
+        <button class="example-btn" data-prompt="What is the difference between HTTP and HTTPS?">🔒 HTTP vs HTTPS</button>
+      </div>
+      <textarea id="prompt" placeholder="Type your question here..."></textarea>
+      <div class="controls">
+        <button id="ask" class="primary">Ask Tutor</button>
+        <button id="clear" class="secondary">Clear</button>
+        <button id="history-btn" class="secondary">View History</button>
+      </div>
+      <div class="out" id="out">Awaiting your question...</div>
+      <div class="history" id="history" style="display:none">
+        <h3>Recent Conversations</h3>
+        <div id="history-list"></div>
+      </div>
+      <div class="meta">
+        <span id="status">Demo mode active</span> |
+        <a href="https://github.com/Zwin-ux/Eidolon-Cognitive-Tutor" target="_blank" style="color:#0b84ff; text-decoration:none">View on GitHub</a>
+      </div>
+    </div>
+    <script>
+      const btn = document.getElementById('ask');
+      const clearBtn = document.getElementById('clear');
+      const historyBtn = document.getElementById('history-btn');
+      const out = document.getElementById('out');
+      const promptEl = document.getElementById('prompt');
+      const statusEl = document.getElementById('status');
+      const historyEl = document.getElementById('history');
+      const historyList = document.getElementById('history-list');
+      let sessionId = localStorage.getItem('session_id') || '';
+      let conversationHistory = [];
+      // Example button handlers
+      document.querySelectorAll('.example-btn').forEach(btn => {
+        btn.addEventListener('click', () => {
+          promptEl.value = btn.dataset.prompt;
+          promptEl.focus();
+        });
+      });
+      // Copy button
+      function addCopyButton() {
+        if (document.querySelector('.copy-btn')) return;
+        const copyBtn = document.createElement('button');
+        copyBtn.className = 'copy-btn';
+        copyBtn.textContent = 'Copy';
+        copyBtn.onclick = () => {
+          navigator.clipboard.writeText(out.textContent);
+          copyBtn.textContent = 'Copied!';
+          setTimeout(() => copyBtn.textContent = 'Copy', 2000);
+        };
+        out.appendChild(copyBtn);
+      }
+      // Clear functionality
+      clearBtn.addEventListener('click', () => {
+        promptEl.value = '';
+        out.textContent = 'Awaiting your question...';
+        out.className = 'out';
+        const copyBtn = out.querySelector('.copy-btn');
+        if (copyBtn) copyBtn.remove();
+      });
+      // History toggle
+      historyBtn.addEventListener('click', () => {
+        if (historyEl.style.display === 'none') {
+          loadHistory();
+          historyEl.style.display = 'block';
+          historyBtn.textContent = 'Hide History';
+        } else {
+          historyEl.style.display = 'none';
+          historyBtn.textContent = 'View History';
+        }
+      });
+      // Load history from server
+      async function loadHistory() {
+        if (!sessionId) return;
+        try {
+          const resp = await fetch(`/api/history/${sessionId}`);
+          const data = await resp.json();
+          if (data.history && data.history.length > 0) {
+            historyList.innerHTML = data.history.map(item =>
+              `<div class="history-item"><strong>Q:</strong> ${item.prompt.substring(0, 60)}...<br><strong>A:</strong> ${item.response.substring(0, 80)}...</div>`
+            ).join('');
+          } else {
+            historyList.innerHTML = '<div style="color:#94a3b8">No conversation history yet.</div>';
+          }
+        } catch (e) {
+          historyList.innerHTML = '<div style="color:#94a3b8">Could not load history.</div>';
+        }
+      }
+      // Main ask functionality
+      btn.addEventListener('click', async () => {
+        const prompt = promptEl.value.trim();
+        if (!prompt) {
+          out.textContent = 'Please enter a question.';
+          out.className = 'out error';
+          return;
+        }
+        out.innerHTML = '<span class="spinner"></span>Thinking...';
+        out.className = 'out loading';
+        btn.disabled = true;
+        try {
+          const resp = await fetch('/api/ask', {
+            method: 'POST',
+            headers: { 'Content-Type': 'application/json' },
+            body: JSON.stringify({
+              prompt,
+              session_id: sessionId || undefined
+            })
+          });
+          const data = await resp.json();
+          if (data.session_id && !sessionId) {
+            sessionId = data.session_id;
+            localStorage.setItem('session_id', sessionId);
+          }
+          if (data.error) {
+            out.textContent = '❌ Error: ' + (data.detail || data.error);
+            out.className = 'out error';
+          } else {
+            out.textContent = data.result || JSON.stringify(data, null, 2);
+            out.className = 'out';
+            addCopyButton();
+            statusEl.textContent = `Response from: ${data.source}`;
+            conversationHistory.push({ prompt, response: data.result });
+          }
+        } catch (e) {
+          out.textContent = '❌ Request failed: ' + e.message;
+          out.className = 'out error';
+        } finally {
+          btn.disabled = false;
+        }
+      });
+      // Enter key to submit
+      promptEl.addEventListener('keydown', (e) => {
+        if (e.key === 'Enter' && e.ctrlKey) {
+          btn.click();
+        }
+      });
+    </script>
+  </body>
+</html>

tests/test_api.py ADDED Viewed

	@@ -0,0 +1,80 @@

+import os
+import pytest
+from fastapi.testclient import TestClient
+os.environ["DEMO_MODE"] = "1"
+os.environ["RATE_LIMIT_REQUESTS"] = "100"  # high limit for tests
+from api.ask import app  # import after setting env var
+@pytest.fixture
+def client():
+    return TestClient(app)
+def test_api_demo_mode_basic(client):
+    """Test basic demo mode response."""
+    payload = {"prompt": "Explain gravity in simple terms"}
+    resp = client.post("/", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert isinstance(data, dict)
+    assert "result" in data
+    assert data["source"] == "demo"
+    assert "Demo" in data["result"] or "explain" in data["result"].lower()
+def test_api_demo_mode_code_prompt(client):
+    """Test demo mode with code-related prompt."""
+    payload = {"prompt": "How to implement quicksort"}
+    resp = client.post("/", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert "result" in data
+    assert "steps" in data["result"].lower() or "implement" in data["result"].lower()
+def test_api_session_id_returned(client):
+    """Test that session ID is returned."""
+    payload = {"prompt": "Test prompt"}
+    resp = client.post("/", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert "session_id" in data
+    assert len(data["session_id"]) > 0
+def test_api_session_id_persistence(client):
+    """Test that provided session ID is returned."""
+    session_id = "test-session-123"
+    payload = {"prompt": "Test prompt", "session_id": session_id}
+    resp = client.post("/", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["session_id"] == session_id
+def test_api_empty_prompt(client):
+    """Test API with empty prompt."""
+    payload = {"prompt": ""}
+    resp = client.post("/", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert "result" in data
+    assert "Please enter" in data["result"]
+def test_api_history_endpoint(client):
+    """Test history retrieval endpoint."""
+    # First make a request
+    session_id = "test-history-session"
+    payload = {"prompt": "Test question", "session_id": session_id}
+    client.post("/", json=payload)
+    # Then retrieve history
+    resp = client.get(f"/history/{session_id}")
+    assert resp.status_code == 200
+    data = resp.json()
+    assert "history" in data
+    assert isinstance(data["history"], list)