|
|
--- |
|
|
title: Web MCP |
|
|
emoji: 🔎 |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 5.36.2 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
short_description: Search & fetch the web with per-tool analytics |
|
|
thumbnail: >- |
|
|
https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png |
|
|
--- |
|
|
|
|
|
# Web MCP Server |
|
|
|
|
|
A Model Context Protocol (MCP) server that exposes two composable tools—`search` (Serper metadata) and `fetch` (single-page extraction)—alongside a live analytics dashboard that tracks daily usage for each tool. The UI runs on Gradio and can be reached directly or via MCP-compatible clients like Claude Desktop and Cursor. |
|
|
|
|
|
## Highlights |
|
|
- Dual MCP tools with shared rate limiting (`360 requests/hour`) and structured JSON responses. |
|
|
- Daily analytics split by tool: the **Analytics** tab renders "Daily Search" (left) and "Daily Fetch" (right) bar charts covering the last 14 days. |
|
|
- Persistent request counters keyed by UTC date and tool: `{"YYYY-MM-DD": {"search": n, "fetch": m}}`, with automatic migration from legacy totals. |
|
|
- Pluggable storage: respects `ANALYTICS_DATA_DIR`, otherwise falls back to `/data` (if writable) or `./data` for local development. |
|
|
- Ready-to-serve Gradio app with MCP endpoints exposed via `gr.api` for direct client consumption. |
|
|
|
|
|
## Requirements |
|
|
- Python 3.8 or newer. |
|
|
- Serper API key (`SERPER_API_KEY`) with access to the Search and News endpoints. |
|
|
- Dependencies listed in `requirements.txt`, including `filelock` and `pandas` for analytics storage. |
|
|
|
|
|
Install everything with: |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
## Configuration |
|
|
1. Export your Serper API key: |
|
|
```bash |
|
|
export SERPER_API_KEY="your-api-key" |
|
|
``` |
|
|
2. (Optional) Override the analytics storage path: |
|
|
```bash |
|
|
export ANALYTICS_DATA_DIR="/path/to/persistent/storage" |
|
|
``` |
|
|
If unset, the app automatically prefers `/data` when available, otherwise `./data`. |
|
|
|
|
|
3. (Optional) Control private/local address policy for `fetch`: |
|
|
- `FETCH_ALLOW_PRIVATE` — set to `1`/`true` to disable the SSRF guard entirely (not recommended except for trusted, local testing). |
|
|
- `FETCH_PRIVATE_ALLOWLIST` — comma/space separated host patterns allowed even if they resolve to private/local IPs, e.g.: |
|
|
```bash |
|
|
export FETCH_PRIVATE_ALLOWLIST="*.corp.local, my-proxy.internal" |
|
|
``` |
|
|
If neither is set, the fetcher refuses URLs whose host resolves to private, loopback, link‑local, multicast, reserved, or unspecified addresses. It also re-checks the final redirect target. |
|
|
|
|
|
The request counters live in `<DATA_DIR>/request_counts.json`, guarded by a file lock to support concurrent MCP calls. |
|
|
|
|
|
## Running Locally |
|
|
Launch the Gradio server (with MCP support enabled) via: |
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
This starts a local UI at `http://localhost:7860` and exposes the MCP SSE endpoint at `http://localhost:7860/gradio_api/mcp/sse`. |
|
|
|
|
|
### Connecting From MCP Clients |
|
|
- **Claude Desktop** – update `claude_desktop_config.json`: |
|
|
```json |
|
|
{ |
|
|
"mcpServers": { |
|
|
"web-search": { |
|
|
"command": "python", |
|
|
"args": ["/absolute/path/to/app.py"], |
|
|
"env": { |
|
|
"SERPER_API_KEY": "your-api-key" |
|
|
} |
|
|
} |
|
|
} |
|
|
} |
|
|
``` |
|
|
- **URL-based MCP clients** – run `python app.py`, then point the client to `http://localhost:7860/gradio_api/mcp/sse`. |
|
|
|
|
|
## Tool Reference |
|
|
### `search` |
|
|
- **Purpose**: Retrieve metadata-only results from Serper (general web or news). |
|
|
- **Inputs**: |
|
|
- `query` *(str, required)* – search terms. |
|
|
- `search_type` *("search" | "news", default "search")* – switch to `news` for recency-focused results. |
|
|
- `num_results` *(int, default 4, range 1–20)* – number of hits to return. |
|
|
- **Output**: JSON containing the query echo, result count, timing, and an array of entries with `position`, `title`, `link`, `domain`, and optional `source`/`date` for news. |
|
|
|
|
|
### `fetch` |
|
|
- **Purpose**: Download a single URL and extract the readable article text via Trafilatura. |
|
|
- **Inputs**: |
|
|
- `url` *(str, required)* – must start with `http://` or `https://`. |
|
|
- `timeout` *(int, default 20 seconds)* – client timeout for the HTTP request. |
|
|
- **Output**: JSON with the original and final URL, domain, HTTP status, title, ISO timestamp of the fetch, word count, cleaned `content`, and duration. |
|
|
|
|
|
Both tools increment their respective analytics buckets on every invocation, including validation failures and rate-limit denials, ensuring the dashboard mirrors real traffic. |
|
|
|
|
|
## Analytics Dashboard |
|
|
Open the **Analytics** tab in the Gradio UI to inspect daily activity: |
|
|
- **Daily Search Count** (left column) – bar chart for the past 14 days of `search` tool requests. |
|
|
- **Daily Fetch Count** (right column) – bar chart for the past 14 days of `fetch` tool requests. |
|
|
- Tooltips reveal the display label (e.g., `Sep 17`), raw count, and ISO date key. |
|
|
|
|
|
Data is stored in JSON and can be safely externalized for long-term tracking. Existing totals in the legacy integer-only format are automatically migrated during the first write. |
|
|
|
|
|
## Rate Limiting & Error Handling |
|
|
- Global moving-window limit of `360` requests per hour shared across both tools (powered by `limits`). |
|
|
- Standardized error payloads for missing parameters, invalid URLs, Serper issues, HTTP failures, and rate-limit hits, each preserving analytics increments. |
|
|
|
|
|
## Troubleshooting |
|
|
- **`SERPER_API_KEY is not set`** – export the key in the environment where the server runs. |
|
|
- **`Rate limit exceeded`** – pause requests or reduce client concurrency. |
|
|
- **Empty extraction** – some sites block bots; try another URL. |
|
|
- **Storage permissions** – ensure the chosen data directory is writable; adjust `ANALYTICS_DATA_DIR` if necessary. |
|
|
|
|
|
## Licensing & Contributions |
|
|
Feel free to fork and adapt for your MCP workflows. Contributions are welcome—open a PR or issue with proposed analytics enhancements, additional tooling, or documentation tweaks. |
|
|
|