web / README.md
victor's picture
victor HF Staff
fetch: add private-host allowlist + env switch; include resolved IPs in error and recheck redirects; document env vars in README
f71c1c7
---
title: Web MCP
emoji: 🔎
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false
short_description: Search & fetch the web with per-tool analytics
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png
---
# Web MCP Server
A Model Context Protocol (MCP) server that exposes two composable tools—`search` (Serper metadata) and `fetch` (single-page extraction)—alongside a live analytics dashboard that tracks daily usage for each tool. The UI runs on Gradio and can be reached directly or via MCP-compatible clients like Claude Desktop and Cursor.
## Highlights
- Dual MCP tools with shared rate limiting (`360 requests/hour`) and structured JSON responses.
- Daily analytics split by tool: the **Analytics** tab renders "Daily Search" (left) and "Daily Fetch" (right) bar charts covering the last 14 days.
- Persistent request counters keyed by UTC date and tool: `{"YYYY-MM-DD": {"search": n, "fetch": m}}`, with automatic migration from legacy totals.
- Pluggable storage: respects `ANALYTICS_DATA_DIR`, otherwise falls back to `/data` (if writable) or `./data` for local development.
- Ready-to-serve Gradio app with MCP endpoints exposed via `gr.api` for direct client consumption.
## Requirements
- Python 3.8 or newer.
- Serper API key (`SERPER_API_KEY`) with access to the Search and News endpoints.
- Dependencies listed in `requirements.txt`, including `filelock` and `pandas` for analytics storage.
Install everything with:
```bash
pip install -r requirements.txt
```
## Configuration
1. Export your Serper API key:
```bash
export SERPER_API_KEY="your-api-key"
```
2. (Optional) Override the analytics storage path:
```bash
export ANALYTICS_DATA_DIR="/path/to/persistent/storage"
```
If unset, the app automatically prefers `/data` when available, otherwise `./data`.
3. (Optional) Control private/local address policy for `fetch`:
- `FETCH_ALLOW_PRIVATE` — set to `1`/`true` to disable the SSRF guard entirely (not recommended except for trusted, local testing).
- `FETCH_PRIVATE_ALLOWLIST` — comma/space separated host patterns allowed even if they resolve to private/local IPs, e.g.:
```bash
export FETCH_PRIVATE_ALLOWLIST="*.corp.local, my-proxy.internal"
```
If neither is set, the fetcher refuses URLs whose host resolves to private, loopback, link‑local, multicast, reserved, or unspecified addresses. It also re-checks the final redirect target.
The request counters live in `<DATA_DIR>/request_counts.json`, guarded by a file lock to support concurrent MCP calls.
## Running Locally
Launch the Gradio server (with MCP support enabled) via:
```bash
python app.py
```
This starts a local UI at `http://localhost:7860` and exposes the MCP SSE endpoint at `http://localhost:7860/gradio_api/mcp/sse`.
### Connecting From MCP Clients
- **Claude Desktop** – update `claude_desktop_config.json`:
```json
{
"mcpServers": {
"web-search": {
"command": "python",
"args": ["/absolute/path/to/app.py"],
"env": {
"SERPER_API_KEY": "your-api-key"
}
}
}
}
```
- **URL-based MCP clients** – run `python app.py`, then point the client to `http://localhost:7860/gradio_api/mcp/sse`.
## Tool Reference
### `search`
- **Purpose**: Retrieve metadata-only results from Serper (general web or news).
- **Inputs**:
- `query` *(str, required)* – search terms.
- `search_type` *("search" | "news", default "search")* – switch to `news` for recency-focused results.
- `num_results` *(int, default 4, range 1–20)* – number of hits to return.
- **Output**: JSON containing the query echo, result count, timing, and an array of entries with `position`, `title`, `link`, `domain`, and optional `source`/`date` for news.
### `fetch`
- **Purpose**: Download a single URL and extract the readable article text via Trafilatura.
- **Inputs**:
- `url` *(str, required)* – must start with `http://` or `https://`.
- `timeout` *(int, default 20 seconds)* – client timeout for the HTTP request.
- **Output**: JSON with the original and final URL, domain, HTTP status, title, ISO timestamp of the fetch, word count, cleaned `content`, and duration.
Both tools increment their respective analytics buckets on every invocation, including validation failures and rate-limit denials, ensuring the dashboard mirrors real traffic.
## Analytics Dashboard
Open the **Analytics** tab in the Gradio UI to inspect daily activity:
- **Daily Search Count** (left column) – bar chart for the past 14 days of `search` tool requests.
- **Daily Fetch Count** (right column) – bar chart for the past 14 days of `fetch` tool requests.
- Tooltips reveal the display label (e.g., `Sep 17`), raw count, and ISO date key.
Data is stored in JSON and can be safely externalized for long-term tracking. Existing totals in the legacy integer-only format are automatically migrated during the first write.
## Rate Limiting & Error Handling
- Global moving-window limit of `360` requests per hour shared across both tools (powered by `limits`).
- Standardized error payloads for missing parameters, invalid URLs, Serper issues, HTTP failures, and rate-limit hits, each preserving analytics increments.
## Troubleshooting
- **`SERPER_API_KEY is not set`** – export the key in the environment where the server runs.
- **`Rate limit exceeded`** – pause requests or reduce client concurrency.
- **Empty extraction** – some sites block bots; try another URL.
- **Storage permissions** – ensure the chosen data directory is writable; adjust `ANALYTICS_DATA_DIR` if necessary.
## Licensing & Contributions
Feel free to fork and adapt for your MCP workflows. Contributions are welcome—open a PR or issue with proposed analytics enhancements, additional tooling, or documentation tweaks.