{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Whisper API → Whisper.cpp Migration\n", "\n", "This notebook demonstrates how to migrate from OpenAI's Whisper API to using Whisper.cpp for local speech-to-text processing.\n", "\n", "## Benefits of Whisper.cpp\n", "- **Local processing**: No API calls, complete privacy\n", "- **Cost savings**: No per-minute charges\n", "- **Offline capability**: Works without internet\n", "- **Customization**: Fine-tune for your specific use case\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install whisper.cpp and dependencies\n", "%pip install whisper-cpp-python\n", "%pip install librosa soundfile\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import whisper_cpp\n", "import librosa\n", "import soundfile as sf\n", "import numpy as np\n", "from pathlib import Path\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Loading\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load Whisper model (downloads automatically on first run)\n", "model = whisper_cpp.Whisper.from_pretrained(\"base\")\n", "print(\"Model loaded successfully!\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Audio Processing Function\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def transcribe_audio(audio_file_path):\n", " \"\"\"\n", " Transcribe audio file using Whisper.cpp\n", " \n", " Args:\n", " audio_file_path (str): Path to audio file\n", " \n", " Returns:\n", " dict: Transcription result with text and metadata\n", " \"\"\"\n", " # Load audio file\n", " audio, sr = librosa.load(audio_file_path, sr=16000)\n", " \n", " # Transcribe using Whisper.cpp\n", " result = model.transcribe(audio)\n", " \n", " return {\n", " \"text\": result[\"text\"],\n", " \"language\": result.get(\"language\", \"auto\"),\n", " \"segments\": result.get(\"segments\", [])\n", " }\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interactive Demo\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#nbgradio name=\"whisper_api_to_whisper_cpp\"\n", "import gradio as gr\n", "\n", "def process_audio(audio_file):\n", " if audio_file is None:\n", " return \"Please upload an audio file.\"\n", " \n", " try:\n", " result = transcribe_audio(audio_file)\n", " return f\"**Transcription:**\\\\n{result['text']}\\\\n\\\\n**Language:** {result['language']}\"\n", " except Exception as e:\n", " return f\"Error processing audio: {str(e)}\"\n", "\n", "# Create Gradio interface\n", "demo = gr.Interface(\n", " fn=process_audio,\n", " inputs=gr.Audio(type=\"filepath\"),\n", " outputs=gr.Markdown(),\n", " title=\"Whisper.cpp Speech-to-Text\",\n", " description=\"Upload an audio file to transcribe it using Whisper.cpp (local processing)\",\n", " examples=[\n", " # Add example audio files here\n", " ]\n", ")\n", "\n", "demo.launch()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }