Spaces:
Running
Android V4 Local Testing Guide
Quick Start
Your V4 API is running on your Mac and accessible to your Android app on the same WiFi network.
Connection Details
- Base URL:
http://192.168.88.12:7860 - V4 Endpoint:
/api/v4/scrape-and-summarize/stream-ndjson(recommended) - Alternative Endpoint:
/api/v4/scrape-and-summarize/stream - Model: Qwen/Qwen2.5-3B-Instruct (high quality, ~6-7GB RAM)
- Network: Both devices must be on the same WiFi network
Android App Configuration
Update Your Base URL
In your Android app's network configuration, change the base URL to:
// Development/Local Testing
const val BASE_URL = "http://192.168.88.12:7860"
// Production (HuggingFace Spaces)
const val BASE_URL_PROD = "https://your-hf-space.hf.space"
Network Security Config
Add this to res/xml/network_security_config.xml to allow HTTP connections to your local server:
<?xml version="1.0" encoding="utf-8"?>
<network-security-config>
<domain-config cleartextTrafficPermitted="true">
<domain includeSubdomains="true">192.168.88.12</domain>
</domain-config>
</network-security-config>
Update your AndroidManifest.xml:
<application
android:networkSecurityConfig="@xml/network_security_config"
...>
API Usage Examples
Endpoint 1: NDJSON Streaming (Recommended - 43% faster)
URL: http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson
Request Body (URL mode):
{
"url": "https://example.com/article",
"style": "executive",
"max_tokens": 512
}
Request Body (Text mode):
{
"text": "Your article text here (minimum 50 characters)...",
"style": "executive",
"max_tokens": 512
}
Response Format (NDJSON patches):
data: {"op":"replace","path":"/title","value":"Breaking News"}
data: {"op":"replace","path":"/main_summary","value":"This is the summary..."}
data: {"op":"add","path":"/key_points/0","value":"First key point"}
data: {"op":"add","path":"/key_points/1","value":"Second key point"}
data: {"op":"replace","path":"/category","value":"Technology"}
data: {"op":"replace","path":"/sentiment","value":"neutral"}
data: {"op":"replace","path":"/read_time_min","value":3}
Final JSON Structure:
{
"title": "Breaking News",
"main_summary": "This is the summary...",
"key_points": [
"First key point",
"Second key point",
"Third key point"
],
"category": "Technology",
"sentiment": "neutral",
"read_time_min": 3
}
Endpoint 2: Raw JSON Streaming
URL: http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream
Request/Response: Same as above, but streams raw JSON tokens instead of NDJSON patches
Summarization Styles
Choose the style that best fits your use case:
| Style | Description | Use Case |
|---|---|---|
executive |
Business-focused with key takeaways (default) | General articles, news |
skimmer |
Quick facts and highlights | Fast reading, headlines |
eli5 |
"Explain Like I'm 5" - simple explanations | Complex topics, education |
cURL Testing Commands
Test with URL (Web Scraping)
curl -X POST http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.bbc.com/news/technology",
"style": "executive",
"max_tokens": 512
}'
Test with Direct Text
curl -X POST http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson \
-H "Content-Type: application/json" \
-d '{
"text": "Artificial intelligence is rapidly transforming the technology landscape. Companies are investing billions in AI research and development. Machine learning models are becoming more sophisticated and capable of handling complex tasks. From healthcare to finance, AI applications are revolutionizing industries and creating new opportunities for innovation.",
"style": "executive",
"max_tokens": 512
}'
Test from Your Android Device
# If you have Termux or similar on Android:
curl -X POST http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson \
-H "Content-Type: application/json" \
-d '{"text":"Test from Android","style":"executive"}'
Kotlin/Android Example
Using OkHttp + SSE
import okhttp3.*
import okhttp3.sse.EventSource
import okhttp3.sse.EventSourceListener
import okhttp3.sse.EventSources
class V4ApiClient {
private val client = OkHttpClient()
fun summarizeUrl(
url: String,
style: String = "executive",
maxTokens: Int = 512,
onPatch: (String) -> Unit,
onComplete: () -> Unit,
onError: (Throwable) -> Unit
) {
val request = Request.Builder()
.url("http://192.168.88.12:7860/api/v4/scrape-and-summarize/stream-ndjson")
.post(
"""
{
"url": "$url",
"style": "$style",
"max_tokens": $maxTokens
}
""".trimIndent().toRequestBody("application/json".toMediaType())
)
.build()
val eventSourceListener = object : EventSourceListener() {
override fun onEvent(
eventSource: EventSource,
id: String?,
type: String?,
data: String
) {
onPatch(data) // NDJSON patch
}
override fun onClosed(eventSource: EventSource) {
onComplete()
}
override fun onFailure(
eventSource: EventSource,
t: Throwable?,
response: Response?
) {
onError(t ?: Exception("Unknown error"))
}
}
EventSources.createFactory(client)
.newEventSource(request, eventSourceListener)
}
}
// Usage:
val apiClient = V4ApiClient()
val summary = mutableMapOf<String, Any>()
apiClient.summarizeUrl(
url = "https://example.com/article",
style = "executive",
onPatch = { patch ->
// Parse NDJSON patch and update summary object
val jsonPatch = JSONObject(patch)
val op = jsonPatch.getString("op")
val path = jsonPatch.getString("path")
val value = jsonPatch.get("value")
// Apply patch to summary map
applyPatch(summary, op, path, value)
// Update UI with partial results
updateUI(summary)
},
onComplete = {
Log.d("V4", "Summary complete: $summary")
},
onError = { error ->
Log.e("V4", "Error: ${error.message}")
}
)
Performance Expectations
Qwen/Qwen2.5-3B-Instruct (Current Configuration)
- Memory: ~6-7GB unified memory on Mac
- Inference Time: 40-60 seconds per request
- Quality: ββββ (high quality, coherent summaries)
- First Token: ~1-2 seconds (fast UI feedback)
- Device: CPU (MPS not detected in current run)
Optimization Tips
- Use NDJSON endpoint for 43% faster time-to-first-token
- Keep max_tokens at 512 for complete summaries
- Test with WiFi (Bluetooth/USB tethering may be slower)
- Monitor battery on Android during long sessions
Troubleshooting
Connection Refused
Problem: Failed to connect to /192.168.88.12:7860
Solutions:
- Check both devices are on same WiFi network
- Verify server is running:
lsof -i :7860 - Check Mac's firewall settings (System Settings β Network β Firewall)
- Try pinging Mac from Android:
ping 192.168.88.12
Empty or Incomplete Summaries
Problem: Summary JSON is incomplete or empty
Solutions:
- Increase
max_tokensto 512 or higher - Ensure input text is at least 50 characters
- Check server logs:
tail -f server.log - Try switching from URL mode to text mode
Slow Response
Problem: Takes > 2 minutes to get results
Solutions:
- V4 with 3B model is computationally intensive (40-60s normal)
- Consider switching to 1.5B model for faster responses (lower quality)
- Update
.env:V4_MODEL_ID=Qwen/Qwen2.5-1.5B-Instruct - Restart server after model change
SSRF Protection Blocking URLs
Problem: "Invalid URL or SSRF protection triggered"
Solutions:
- Don't use localhost/127.0.0.1 URLs
- Don't use private IP ranges (10.x, 192.168.x, 172.x)
- Use public URLs only
- For testing, use text mode instead of URL mode
Server Management
Start Server
# Option 1: Using conda environment
conda run -n summarizer python -m uvicorn app.main:app --host 0.0.0.0 --port 7860
# Option 2: Using startup script (see below)
./start_v4_local.sh
Check Server Status
# Check if server is running
lsof -i :7860
# View real-time logs
tail -f server.log
# Check health endpoint
curl http://localhost:7860/health
Stop Server
# Find and kill the process
pkill -f "uvicorn app.main:app"
# Or kill by PID
lsof -ti :7860 | xargs kill
API Documentation
Health Check
GET http://192.168.88.12:7860/health
Response:
{
"status": "ok",
"service": "summarizer",
"version": "4.0.0"
}
Available Endpoints
GET /- API documentation (Swagger UI)GET /health- Health checkPOST /api/v1/*- Ollama + Transformers (requires Ollama service)POST /api/v2/*- HuggingFace streaming (distilbart)POST /api/v3/*- Web scraping + V2 summarizationPOST /api/v4/*- Structured JSON summarization (Qwen model)
Security Notes
- HTTP Only: Local testing uses HTTP (not HTTPS)
- No Authentication: API is open on local network
- Rate Limiting: Not enabled by default for local testing
- SSRF Protection: Blocks localhost and private IPs in URL mode
- Production: Use HTTPS and authentication for production deployments
Next Steps
- β
Configure your Android app's base URL to
http://192.168.88.12:7860 - β Add network security config for cleartext HTTP
- β Test connection with cURL before Android testing
- β Implement SSE parsing for NDJSON patches
- β Add error handling for network failures
- β
Monitor performance and adjust
max_tokensas needed
Support
- Server Logs:
/Users/ming/AndroidStudioProjects/SummerizerApp/server.log - Configuration:
/Users/ming/AndroidStudioProjects/SummerizerApp/.env - Documentation: See
V4_LOCAL_SETUP.mdandV4_TESTING_LEARNINGS.md