Whisper-TikTok-Demo / README.md
MatteoFasulo's picture
Fix README metadata app file
7d88991
---
title: Whisper TikTok Demo
emoji: πŸ“š
colorFrom: yellow
colorTo: purple
sdk: streamlit
sdk_version: 1.36.0
app_file: app.py
pinned: false
license: apache-2.0
---
# Introducing Whisper-TikTok πŸ€–πŸŽ₯
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=MatteoFasulo/Whisper-TikTok&type=Date)](https://star-history.com/#MatteoFasulo/Whisper-TikTok&Date)
## Table of Contents
- [Introduction](#introduction)
- [Video (demo)](#demo-video)
- [How it works?](#how-it-works)
- [Web App (Online)](#web-app-online)
- [Streamlit Web App](#streamlit-web-app)
- [Local Installation](#local-installation)
- [Dependencies](#dependencies)
- [Web-UI (Local)](#web-ui-local)
- [Command-Line](#command-line)
- [Usage Examples](#usage-examples)
- [Additional Resources](#additional-resources)
- [Code of Conduct](#code-of-conduct)
- [Contributing](#contributing)
- [Upcoming Features](#upcoming-features)
- [Acknowledgments](#acknowledgments)
- [License](#license)
## Introduction
Discover Whisper-TikTok, an innovative AI-powered tool that leverages the prowess of **Edge TTS**, **OpenAI-Whisper**, and **FFMPEG** to craft captivating TikTok videos. Harnessing the capabilities of OpenAI's Whisper model, Whisper-TikTok effortlessly generates an accurate **transcription** from provided audio files, laying the foundation for the creation of mesmerizing TikTok videos through the utilization of **FFMPEG**. Additionally, the program seamlessly integrates the **Microsoft Edge Cloud Text-to-Speech (TTS) API** to lend a vibrant **voiceover** to the video. Opting for Microsoft Edge Cloud TTS API's voiceover is a deliberate choice, as it delivers a remarkably **natural and authentic** auditory experience, setting it apart from the often monotonous and artificial voiceovers prevalent in numerous TikTok videos.
## Streamlit Web App
![Webui](docs/WebuiDemo.png)
## Demo Video
<https://github.com/MatteoFasulo/Whisper-TikTok/assets/74818541/68e25504-c305-4144-bd39-c9acc218c3a4>
## How it Works
Employing Whisper-TikTok is a breeze: simply modify the [clips.csv](clips.csv). The CSV file contains the following attributes:
- `series`: The name of the series.
- `part`: The part number of the video.
- `text`: The text to be spoken in the video.
- `tags`: The tags to be used for the video.
- `outro`: The outro text to be spoken in the video.
<details>
<summary>Details</summary>
The program conducts the **sequence of actions** outlined below:
1. Retrieve **environment variables** from the optional .env file.
2. Validate the presence of **PyTorch** with **CUDA** installation. If the requisite dependencies are **absent**, the **program will use the CPU instead of the GPU**.
3. Download a random video from platforms like YouTube, e.g., a Minecraft parkour gameplay clip.
4. Load the OpenAI Whisper model into memory.
5. Extract the video text from the provided JSON file and initiate a **Text-to-Speech** request to the Microsoft Edge Cloud TTS API, preserving the response as an .mp3 audio file.
6. Utilize the OpenAI Whisper model to generate a detailed **transcription** of the .mp3 file, available in .srt format.
7. Select a **random background** video from the dedicated folder.
8. Integrate the srt file into the chosen video using FFMPEG, creating a final .mp4 output.
9. Upload the video to TikTok using the TikTok session cookie. For this step it is required to have a TikTok account and to be logged in on your browser. Then the required `cookies.txt` file can be generated using [this guide available here](https://github.com/kairi003/Get-cookies.txt-LOCALLY). The `cookies.txt` file must be placed in the root folder of the project.
10. Voila! In a matter of minutes, you've crafted a captivating TikTok video while sipping your favorite coffee β˜•οΈ.
</details>
## Web App (Online)
There is a Web App hosted thanks to Streamlit which is public available in HuggingFace, just click on the link that will take you directly to the Web App.
> https://huggingface.co/spaces/MatteoFasulo/Whisper-TikTok-Demo
## Local Installation
Whisper-TikTok has undergone rigorous testing on Windows 10, Windows 11 and Ubuntu 23.04 systems equipped with **Python versions 3.8, 3.9 and 3.11**.
If you want to run Whisper-TikTok locally, you can clone the repository using the following command:
```bash
git clone https://github.com/MatteoFasulo/Whisper-TikTok.git
```
> However, there is also a Docker image available for Whisper-TikTok which can be used to run the program in a containerized environment.
# Dependencies
To streamline the installation of necessary dependencies, execute the following command within your terminal:
```python
pip install -U -r requirements.txt
```
It also requires the command-line tool [**FFMPEG**](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:
```bash
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (<https://brew.sh/>)
brew install ffmpeg
# on Windows using Chocolatey (<https://chocolatey.org/>)
choco install ffmpeg
# on Windows using Scoop (<https://scoop.sh/>)
scoop install ffmpeg
```
> Please note that for optimal performance, it's advisable to have a GPU when using the OpenAI Whisper model for Automatic Speech Recognition (ASR). However, the program will also work without a GPU, but it will run more slowly.
## Web-UI (Local)
To run the Web-UI locally, execute the following command within your terminal:
```bash
streamlit run app.py
```
## Command-Line
To run the program from the command-line, execute the following command within your terminal:
```bash
python main.py
```
### CLI Options
Whisper-TikTok supports the following command-line options:
```
python main.py [OPTIONS]
Options:
--model TEXT Model to use [tiny|base|small|medium|large] (Default: small)
--non_english Use general model, not the English one specifically. (Flag)
--url TEXT YouTube URL to download as background video. (Default: <https://www.youtube.com/watch?v=intRX7BRA90>)
--tts TEXT Voice to use for TTS (Default: en-US-ChristopherNeural)
--list-voices Use `edge-tts --list-voices` to list all voices.
--random_voice Random voice for TTS (Flag)
--gender TEXT Gender of the random TTS voice [Male|Female].
--language TEXT Language of the random TTS voice(e.g., en-US)
--sub_format TEXT Subtitle format to use [u|i|b] (Default: b) | b (Bold), u (Underline), i (Italic)
--sub_position INT Subtitle position to use [1-9] (Default: 5)
--font TEXT Font to use for subtitles (Default: Lexend Bold)
--font_color TEXT Font color to use for subtitles in HEX format (Default: #FFF000).
--font_size INT Font size to use for subtitles (Default: 21)
--max_characters INT Maximum number of characters per line (Default: 38)
--max_words INT Maximum number of words per segment (Default: 2)
--upload_tiktok Upload the video to TikTok (Flag)
-v, --verbose Verbose (Flag)
```
> If you use the --random_voice option, please specify both --gender and --language arguments. Also you will need to specify the --non_english argument if you want to use a non-English voice otherwise the program will use the English model. Whisper model will auto-detect the language of the audio file and use the corresponding model.
## Usage Examples
- Generate a TikTok video using a specific TTS model and voice:
```bash
python main.py --model medium --tts en-US-EricNeural
```
- Generate a TikTok video without using the English model:
```bash
python main.py --non_english --tts de-DE-KillianNeural
```
- Use a custom YouTube video as the background video:
```bash
python main.py --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --tts en-US-JennyNeural
```
- Modify the font color of the subtitles:
```bash
python main.py --sub_format b --font_color #FFF000 --tts en-US-JennyNeural
```
- Generate a TikTok video with a random TTS voice:
```bash
python main.py --random_voice --gender Male --language en-US
```
- List all available voices:
```bash
edge-tts --list-voices
```
## Additional Resources
### Code of Conduct
Please review our [Code of Conduct](./CODE_OF_CONDUCT.md) before contributing to Whisper-TikTok.
### Contributing
We welcome contributions from the community! Please see our [Contributing Guidelines](./CONTRIBUTING.md) for more information.
### Upcoming Features
- Integration with the OpenAI API to generate more advanced responses.
- Generate content by extracting it from reddit <https://github.com/MatteoFasulo/Whisper-TikTok/issues/22>
### Acknowledgments
- We'd like to give a huge thanks to [@rany2](https://www.github.com/rany2) for their [edge-tts](https://github.com/rany2/edge-tts) package, which made it possible to use the Microsoft Edge Cloud TTS API with Whisper-TikTok.
- We also acknowledge the contributions of the Whisper model by [@OpenAI](https://github.com/openai/whisper) for robust speech recognition via large-scale weak supervision
- Also [@jianfch](https://github.com/jianfch/stable-ts) for the stable-ts package, which made it possible to use the OpenAI Whisper model with Whisper-TikTok in a stable manner with font color and subtitle format options.
### License
Whisper-TikTok is licensed under the [Apache License, Version 2.0](https://github.com/MatteoFasulo/Whisper-TikTok/blob/main/LICENSE).