File size: 5,919 Bytes
add19e8
6cac6d5
6ef48c6
6cac6d5
6ef48c6
add19e8
 
 
 
58d88bf
8f2fadd
 
add19e8
 
58d88bf
6ef48c6
58d88bf
6ef48c6
58d88bf
 
 
 
 
 
6ef48c6
58d88bf
 
 
 
e90574b
58d88bf
 
 
 
6ef48c6
58d88bf
 
6ef48c6
58d88bf
6ef48c6
58d88bf
6ef48c6
58d88bf
6ef48c6
58d88bf
6ef48c6
f71c1c7
 
 
 
 
 
 
 
58d88bf
6ef48c6
58d88bf
 
6ef48c6
58d88bf
6ef48c6
58d88bf
 
 
 
 
 
 
 
 
 
 
 
 
6ef48c6
 
 
58d88bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e90574b
6ef48c6
58d88bf
 
 
 
e90574b
58d88bf
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
title: Web MCP
emoji: 🔎
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false
short_description: Search & fetch the web with per-tool analytics
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png
---

# Web MCP Server

A Model Context Protocol (MCP) server that exposes two composable tools—`search` (Serper metadata) and `fetch` (single-page extraction)—alongside a live analytics dashboard that tracks daily usage for each tool. The UI runs on Gradio and can be reached directly or via MCP-compatible clients like Claude Desktop and Cursor.

## Highlights
- Dual MCP tools with shared rate limiting (`360 requests/hour`) and structured JSON responses.
- Daily analytics split by tool: the **Analytics** tab renders "Daily Search" (left) and "Daily Fetch" (right) bar charts covering the last 14 days.
- Persistent request counters keyed by UTC date and tool: `{"YYYY-MM-DD": {"search": n, "fetch": m}}`, with automatic migration from legacy totals.
- Pluggable storage: respects `ANALYTICS_DATA_DIR`, otherwise falls back to `/data` (if writable) or `./data` for local development.
- Ready-to-serve Gradio app with MCP endpoints exposed via `gr.api` for direct client consumption.

## Requirements
- Python 3.8 or newer.
- Serper API key (`SERPER_API_KEY`) with access to the Search and News endpoints.
- Dependencies listed in `requirements.txt`, including `filelock` and `pandas` for analytics storage.

Install everything with:
```bash
pip install -r requirements.txt
```

## Configuration
1. Export your Serper API key:
   ```bash
   export SERPER_API_KEY="your-api-key"
   ```
2. (Optional) Override the analytics storage path:
   ```bash
   export ANALYTICS_DATA_DIR="/path/to/persistent/storage"
   ```
   If unset, the app automatically prefers `/data` when available, otherwise `./data`.

3. (Optional) Control private/local address policy for `fetch`:
   - `FETCH_ALLOW_PRIVATE` — set to `1`/`true` to disable the SSRF guard entirely (not recommended except for trusted, local testing).
   - `FETCH_PRIVATE_ALLOWLIST` — comma/space separated host patterns allowed even if they resolve to private/local IPs, e.g.:
     ```bash
     export FETCH_PRIVATE_ALLOWLIST="*.corp.local, my-proxy.internal"
     ```
   If neither is set, the fetcher refuses URLs whose host resolves to private, loopback, link‑local, multicast, reserved, or unspecified addresses. It also re-checks the final redirect target.

The request counters live in `<DATA_DIR>/request_counts.json`, guarded by a file lock to support concurrent MCP calls.

## Running Locally
Launch the Gradio server (with MCP support enabled) via:
```bash
python app.py
```
This starts a local UI at `http://localhost:7860` and exposes the MCP SSE endpoint at `http://localhost:7860/gradio_api/mcp/sse`.

### Connecting From MCP Clients
- **Claude Desktop** – update `claude_desktop_config.json`:
  ```json
  {
    "mcpServers": {
      "web-search": {
        "command": "python",
        "args": ["/absolute/path/to/app.py"],
        "env": {
          "SERPER_API_KEY": "your-api-key"
        }
      }
    }
  }
  ```
- **URL-based MCP clients** – run `python app.py`, then point the client to `http://localhost:7860/gradio_api/mcp/sse`.

## Tool Reference
### `search`
- **Purpose**: Retrieve metadata-only results from Serper (general web or news).
- **Inputs**:
  - `query` *(str, required)* – search terms.
  - `search_type` *("search" | "news", default "search")* – switch to `news` for recency-focused results.
  - `num_results` *(int, default 4, range 1–20)* – number of hits to return.
- **Output**: JSON containing the query echo, result count, timing, and an array of entries with `position`, `title`, `link`, `domain`, and optional `source`/`date` for news.

### `fetch`
- **Purpose**: Download a single URL and extract the readable article text via Trafilatura.
- **Inputs**:
  - `url` *(str, required)* – must start with `http://` or `https://`.
  - `timeout` *(int, default 20 seconds)* – client timeout for the HTTP request.
- **Output**: JSON with the original and final URL, domain, HTTP status, title, ISO timestamp of the fetch, word count, cleaned `content`, and duration.

Both tools increment their respective analytics buckets on every invocation, including validation failures and rate-limit denials, ensuring the dashboard mirrors real traffic.

## Analytics Dashboard
Open the **Analytics** tab in the Gradio UI to inspect daily activity:
- **Daily Search Count** (left column) – bar chart for the past 14 days of `search` tool requests.
- **Daily Fetch Count** (right column) – bar chart for the past 14 days of `fetch` tool requests.
- Tooltips reveal the display label (e.g., `Sep 17`), raw count, and ISO date key.

Data is stored in JSON and can be safely externalized for long-term tracking. Existing totals in the legacy integer-only format are automatically migrated during the first write.

## Rate Limiting & Error Handling
- Global moving-window limit of `360` requests per hour shared across both tools (powered by `limits`).
- Standardized error payloads for missing parameters, invalid URLs, Serper issues, HTTP failures, and rate-limit hits, each preserving analytics increments.

## Troubleshooting
- **`SERPER_API_KEY is not set`** – export the key in the environment where the server runs.
- **`Rate limit exceeded`** – pause requests or reduce client concurrency.
- **Empty extraction** – some sites block bots; try another URL.
- **Storage permissions** – ensure the chosen data directory is writable; adjust `ANALYTICS_DATA_DIR` if necessary.

## Licensing & Contributions
Feel free to fork and adapt for your MCP workflows. Contributions are welcome—open a PR or issue with proposed analytics enhancements, additional tooling, or documentation tweaks.