metadata
title: SmolVLM2 Video Highlights
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
π¬ SmolVLM2 Video Highlights API
Generate intelligent video highlights using SmolVLM2 + Whisper AI
This is a FastAPI service that combines visual analysis (SmolVLM2) with audio transcription (Whisper) to automatically create highlight videos from longer content.
π Features
- Visual Analysis: SmolVLM2-2.2B-Instruct analyzes video frames for interesting content
- Audio Processing: Whisper transcribes speech in 99+ languages
- Smart Scoring: Combines visual and audio analysis for intelligent highlights
- REST API: Upload videos and download processed highlights
- Background Processing: Non-blocking video processing with job tracking
π API Endpoints
POST /upload-video- Upload video for processingGET /job-status/{job_id}- Check processing statusGET /download/{filename}- Download generated highlightsGET /docs- Interactive API documentation
π± Usage
Via API
# Upload video
curl -X POST -F "video=@your_video.mp4" https://avinashhuggingface108-smolvlm2-video-highlights.hf.space/upload-video
# Check status
curl https://avinashhuggingface108-smolvlm2-video-highlights.hf.space/job-status/YOUR_JOB_ID
# Download highlights
curl -O https://avinashhuggingface108-smolvlm2-video-highlights.hf.space/download/FILENAME.mp4
Via Android App
Use the provided Android client code to integrate with your mobile app.
βοΈ Configuration
Default settings:
- Interval: 20 seconds (analyze every 20s)
- Min Score: 6.5 (quality threshold)
- Max Highlights: 3 (maximum highlight segments)
- Whisper Model: base (accuracy vs speed)
- Timeout: 35 seconds per segment
π οΈ Technology Stack
- SmolVLM2-2.2B-Instruct: Vision-language model for visual content analysis
- OpenAI Whisper: Speech-to-text in 99+ languages
- FastAPI: Modern web framework for APIs
- FFmpeg: Video processing and manipulation
- PyTorch: Deep learning framework with MPS acceleration
π― Perfect For
- Social media content creators
- Educational video processing
- Meeting/lecture summarization
- Sports highlight generation
- Entertainment content curation
οΏ½οΏ½ License
Apache 2.0 - Free for commercial and personal use
π€ Contributing
Built with β€οΈ using Hugging Face Transformers and open-source AI models.