ming commited on
Commit
4502cec
Β·
1 Parent(s): 41494e9

feat: Update README for Hugging Face Spaces deployment

Browse files
Files changed (1) hide show
  1. README.md +19 -124
README.md CHANGED
@@ -11,29 +11,16 @@ app_port: 7860
11
 
12
  # Text Summarizer API
13
 
14
- A FastAPI-based text summarization service powered by Ollama and Llama 3.2 1B model.
15
-
16
- **πŸš€ Live Demo**: [https://huggingface.co/spaces/colin730/SummarizerApp](https://huggingface.co/spaces/colin730/SummarizerApp)
17
 
18
  ## πŸš€ Features
19
 
20
  - **Fast text summarization** using local LLM inference
21
- - **Real-time streaming** with Server-Sent Events (SSE) for Android compatibility
22
  - **RESTful API** with FastAPI
23
  - **Health monitoring** and logging
24
  - **Docker containerized** for easy deployment
25
  - **Free deployment** on Hugging Face Spaces
26
 
27
- ## 🌊 Streaming Benefits
28
-
29
- The streaming endpoint (`/api/v1/summarize/stream`) provides several advantages:
30
-
31
- - **Real-time feedback**: Users see text being generated as it happens
32
- - **Better UX**: No waiting for complete response before seeing results
33
- - **Android-friendly**: Uses Server-Sent Events (SSE) for easy mobile integration
34
- - **Progressive loading**: Content appears incrementally, improving perceived performance
35
- - **Error resilience**: Errors are sent as SSE events, maintaining connection
36
-
37
  ## πŸ“‘ API Endpoints
38
 
39
  ### Health Check
@@ -41,7 +28,7 @@ The streaming endpoint (`/api/v1/summarize/stream`) provides several advantages:
41
  GET /health
42
  ```
43
 
44
- ### Summarize Text (Standard)
45
  ```
46
  POST /api/v1/summarize
47
  Content-Type: application/json
@@ -49,33 +36,10 @@ Content-Type: application/json
49
  {
50
  "text": "Your long text to summarize here...",
51
  "max_tokens": 256,
52
- "prompt": "Summarize the following text concisely:"
53
- }
54
- ```
55
-
56
- ### Summarize Text (Streaming)
57
- ```
58
- POST /api/v1/summarize/stream
59
- Content-Type: application/json
60
-
61
- {
62
- "text": "Your long text to summarize here...",
63
- "max_tokens": 256,
64
- "prompt": "Summarize the following text concisely:"
65
  }
66
  ```
67
 
68
- **Response Format**: Server-Sent Events (SSE)
69
- ```
70
- data: {"content": "This", "done": false, "tokens_used": 1}
71
-
72
- data: {"content": " is", "done": false, "tokens_used": 2}
73
-
74
- data: {"content": " a", "done": false, "tokens_used": 3}
75
-
76
- data: {"content": " summary.", "done": true, "tokens_used": 4}
77
- ```
78
-
79
  ### API Documentation
80
  - **Swagger UI**: `/docs`
81
  - **ReDoc**: `/redoc`
@@ -84,11 +48,11 @@ data: {"content": " summary.", "done": true, "tokens_used": 4}
84
 
85
  The service uses the following environment variables:
86
 
87
- - `OLLAMA_MODEL`: Model to use (default: `llama3.2:1b`)
88
- - `OLLAMA_HOST`: Ollama service host (default: `http://0.0.0.0:11434`)
89
- - `OLLAMA_TIMEOUT`: Request timeout in seconds (default: `60`)
90
- - `SERVER_HOST`: Server host (default: `127.0.0.1`)
91
- - `SERVER_PORT`: Server port (default: `8000`)
92
  - `LOG_LEVEL`: Logging level (default: `INFO`)
93
 
94
  ## 🐳 Docker Deployment
@@ -99,7 +63,7 @@ The service uses the following environment variables:
99
  docker-compose up --build
100
 
101
  # Or run directly
102
- docker build -t summarizer-app .
103
  docker run -p 7860:7860 summarizer-app
104
  ```
105
 
@@ -108,11 +72,10 @@ This app is configured for deployment on Hugging Face Spaces using Docker SDK.
108
 
109
  ## πŸ“Š Performance
110
 
111
- - **Model**: Llama 3.2 1B (~1GB RAM requirement)
112
- - **Startup time**: ~1-2 minutes (includes model download)
113
- - **Inference speed**: ~1-3 seconds per request
114
- - **Memory usage**: ~2GB RAM
115
- - **Streaming**: Real-time text generation with SSE for responsive user experience
116
 
117
  ## πŸ› οΈ Development
118
 
@@ -122,7 +85,7 @@ This app is configured for deployment on Hugging Face Spaces using Docker SDK.
122
  pip install -r requirements.txt
123
 
124
  # Run locally
125
- uvicorn app.main:app --host 0.0.0.0 --port 8000
126
  ```
127
 
128
  ### Testing
@@ -136,13 +99,13 @@ pytest --cov=app
136
 
137
  ## πŸ“ Usage Examples
138
 
139
- ### Python (Standard)
140
  ```python
141
  import requests
142
 
143
  # Summarize text
144
  response = requests.post(
145
- "https://huggingface.co/spaces/colin730/SummarizerApp/api/v1/summarize",
146
  json={
147
  "text": "Your long article or text here...",
148
  "max_tokens": 256
@@ -153,32 +116,9 @@ result = response.json()
153
  print(result["summary"])
154
  ```
155
 
156
- ### Python (Streaming)
157
- ```python
158
- import requests
159
- import json
160
-
161
- # Stream summarization
162
- response = requests.post(
163
- "https://huggingface.co/spaces/colin730/SummarizerApp/api/v1/summarize/stream",
164
- json={
165
- "text": "Your long article or text here...",
166
- "max_tokens": 256
167
- },
168
- stream=True
169
- )
170
-
171
- for line in response.iter_lines():
172
- if line.startswith(b'data: '):
173
- data = json.loads(line[6:]) # Remove 'data: ' prefix
174
- print(data["content"], end='', flush=True)
175
- if data["done"]:
176
- break
177
- ```
178
-
179
- ### cURL (Standard)
180
  ```bash
181
- curl -X POST "https://huggingface.co/spaces/colin730/SummarizerApp/api/v1/summarize" \
182
  -H "Content-Type: application/json" \
183
  -d '{
184
  "text": "Your text to summarize...",
@@ -186,46 +126,6 @@ curl -X POST "https://huggingface.co/spaces/colin730/SummarizerApp/api/v1/summar
186
  }'
187
  ```
188
 
189
- ### cURL (Streaming)
190
- ```bash
191
- curl -N -X POST "https://huggingface.co/spaces/colin730/SummarizerApp/api/v1/summarize/stream" \
192
- -H "Content-Type: application/json" \
193
- -d '{
194
- "text": "Your text to summarize...",
195
- "max_tokens": 256
196
- }'
197
- ```
198
-
199
- ### Android (Kotlin)
200
- ```kotlin
201
- // Add to build.gradle
202
- implementation("com.squareup.okhttp3:okhttp-sse:4.12.0")
203
-
204
- // Usage
205
- val client = OkHttpClient()
206
- val request = Request.Builder()
207
- .url("https://huggingface.co/spaces/colin730/SummarizerApp/api/v1/summarize/stream")
208
- .post(/* JSON body */)
209
- .build()
210
-
211
- val eventSource = EventSources.createFactory(client)
212
- .newEventSource(request, object : EventSourceListener() {
213
- override fun onEvent(eventSource: EventSource, id: String?, type: String?, data: String) {
214
- val chunk = JSONObject(data)
215
- val content = chunk.getString("content")
216
- val done = chunk.getBoolean("done")
217
-
218
- // Update UI with streaming content
219
- runOnUiThread {
220
- textView.append(content)
221
- if (done) {
222
- // Streaming complete
223
- }
224
- }
225
- }
226
- })
227
- ```
228
-
229
  ## πŸ”’ Security
230
 
231
  - Non-root user execution
@@ -246,7 +146,7 @@ The service includes:
246
  ### Common Issues
247
 
248
  1. **Model not loading**: Check if Ollama is running and model is pulled
249
- 2. **Out of memory**: Ensure sufficient RAM (2GB+) for Llama 3.2 1B
250
  3. **Slow startup**: Normal on first run due to model download
251
  4. **API errors**: Check logs via `/docs` endpoint
252
 
@@ -268,8 +168,3 @@ MIT License - see LICENSE file for details.
268
  ---
269
 
270
  **Deployed on Hugging Face Spaces** πŸš€
271
- # Force restart Sat Oct 4 23:26:24 NZDT 2025
272
- # Restart trigger Sun Oct 5 00:17:11 NZDT 2025
273
- # Model update restart Sun Oct 5 01:10:33 NZDT 2025
274
- # Model restart Sun Oct 5 01:35:38 NZDT 2025
275
- # Force restart for 1B model Sun Oct 5 01:56:29 NZDT 2025
 
11
 
12
  # Text Summarizer API
13
 
14
+ A FastAPI-based text summarization service powered by Ollama and Mistral 7B model.
 
 
15
 
16
  ## πŸš€ Features
17
 
18
  - **Fast text summarization** using local LLM inference
 
19
  - **RESTful API** with FastAPI
20
  - **Health monitoring** and logging
21
  - **Docker containerized** for easy deployment
22
  - **Free deployment** on Hugging Face Spaces
23
 
 
 
 
 
 
 
 
 
 
 
24
  ## πŸ“‘ API Endpoints
25
 
26
  ### Health Check
 
28
  GET /health
29
  ```
30
 
31
+ ### Summarize Text
32
  ```
33
  POST /api/v1/summarize
34
  Content-Type: application/json
 
36
  {
37
  "text": "Your long text to summarize here...",
38
  "max_tokens": 256,
39
+ "temperature": 0.7
 
 
 
 
 
 
 
 
 
 
 
 
40
  }
41
  ```
42
 
 
 
 
 
 
 
 
 
 
 
 
43
  ### API Documentation
44
  - **Swagger UI**: `/docs`
45
  - **ReDoc**: `/redoc`
 
48
 
49
  The service uses the following environment variables:
50
 
51
+ - `OLLAMA_MODEL`: Model to use (default: `mistral:7b`)
52
+ - `OLLAMA_HOST`: Ollama service host (default: `http://localhost:11434`)
53
+ - `OLLAMA_TIMEOUT`: Request timeout in seconds (default: `30`)
54
+ - `SERVER_HOST`: Server host (default: `0.0.0.0`)
55
+ - `SERVER_PORT`: Server port (default: `7860`)
56
  - `LOG_LEVEL`: Logging level (default: `INFO`)
57
 
58
  ## 🐳 Docker Deployment
 
63
  docker-compose up --build
64
 
65
  # Or run directly
66
+ docker build -f Dockerfile.hf -t summarizer-app .
67
  docker run -p 7860:7860 summarizer-app
68
  ```
69
 
 
72
 
73
  ## πŸ“Š Performance
74
 
75
+ - **Model**: Mistral 7B (7GB RAM requirement)
76
+ - **Startup time**: ~2-3 minutes (includes model download)
77
+ - **Inference speed**: ~2-5 seconds per request
78
+ - **Memory usage**: ~8GB RAM
 
79
 
80
  ## πŸ› οΈ Development
81
 
 
85
  pip install -r requirements.txt
86
 
87
  # Run locally
88
+ uvicorn app.main:app --host 0.0.0.0 --port 7860
89
  ```
90
 
91
  ### Testing
 
99
 
100
  ## πŸ“ Usage Examples
101
 
102
+ ### Python
103
  ```python
104
  import requests
105
 
106
  # Summarize text
107
  response = requests.post(
108
+ "https://your-space.hf.space/api/v1/summarize",
109
  json={
110
  "text": "Your long article or text here...",
111
  "max_tokens": 256
 
116
  print(result["summary"])
117
  ```
118
 
119
+ ### cURL
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ```bash
121
+ curl -X POST "https://your-space.hf.space/api/v1/summarize" \
122
  -H "Content-Type: application/json" \
123
  -d '{
124
  "text": "Your text to summarize...",
 
126
  }'
127
  ```
128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ## πŸ”’ Security
130
 
131
  - Non-root user execution
 
146
  ### Common Issues
147
 
148
  1. **Model not loading**: Check if Ollama is running and model is pulled
149
+ 2. **Out of memory**: Ensure sufficient RAM (8GB+) for Mistral 7B
150
  3. **Slow startup**: Normal on first run due to model download
151
  4. **API errors**: Check logs via `/docs` endpoint
152
 
 
168
  ---
169
 
170
  **Deployed on Hugging Face Spaces** πŸš€