avinashHuggingface108's picture
πŸš€ Deploy SmolVLM2 Video Highlights API
a4bd75a
|
raw
history blame
2.49 kB
metadata
title: SmolVLM2 Video Highlights
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860

🎬 SmolVLM2 Video Highlights API

Generate intelligent video highlights using SmolVLM2 + Whisper AI

This is a FastAPI service that combines visual analysis (SmolVLM2) with audio transcription (Whisper) to automatically create highlight videos from longer content.

πŸš€ Features

  • Visual Analysis: SmolVLM2-2.2B-Instruct analyzes video frames for interesting content
  • Audio Processing: Whisper transcribes speech in 99+ languages
  • Smart Scoring: Combines visual and audio analysis for intelligent highlights
  • REST API: Upload videos and download processed highlights
  • Background Processing: Non-blocking video processing with job tracking

πŸ”— API Endpoints

  • POST /upload-video - Upload video for processing
  • GET /job-status/{job_id} - Check processing status
  • GET /download/{filename} - Download generated highlights
  • GET /docs - Interactive API documentation

πŸ“± Usage

Via API

# Upload video
curl -X POST -F "video=@your_video.mp4" https://avinashhuggingface108-smolvlm2-video-highlights.hf.space/upload-video

# Check status
curl https://avinashhuggingface108-smolvlm2-video-highlights.hf.space/job-status/YOUR_JOB_ID

# Download highlights
curl -O https://avinashhuggingface108-smolvlm2-video-highlights.hf.space/download/FILENAME.mp4

Via Android App

Use the provided Android client code to integrate with your mobile app.

βš™οΈ Configuration

Default settings:

  • Interval: 20 seconds (analyze every 20s)
  • Min Score: 6.5 (quality threshold)
  • Max Highlights: 3 (maximum highlight segments)
  • Whisper Model: base (accuracy vs speed)
  • Timeout: 35 seconds per segment

πŸ› οΈ Technology Stack

  • SmolVLM2-2.2B-Instruct: Vision-language model for visual content analysis
  • OpenAI Whisper: Speech-to-text in 99+ languages
  • FastAPI: Modern web framework for APIs
  • FFmpeg: Video processing and manipulation
  • PyTorch: Deep learning framework with MPS acceleration

🎯 Perfect For

  • Social media content creators
  • Educational video processing
  • Meeting/lecture summarization
  • Sports highlight generation
  • Entertainment content curation

οΏ½οΏ½ License

Apache 2.0 - Free for commercial and personal use

🀝 Contributing

Built with ❀️ using Hugging Face Transformers and open-source AI models.