| # 🐢 Tortoise | |
| Tortoise is a very expressive TTS system with impressive voice cloning capabilities. It is based on an GPT like autogressive acoustic model that converts input | |
| text to discritized acouistic tokens, a diffusion model that converts these tokens to melspeectrogram frames and a Univnet vocoder to convert the spectrograms to | |
| the final audio signal. The important downside is that Tortoise is very slow compared to the parallel TTS models like VITS. | |
| Big thanks to 👑[@manmay-nakhashi](https://github.com/manmay-nakhashi) who helped us implement Tortoise in 🐸TTS. | |
| Example use: | |
| ```python | |
| from TTS.tts.configs.tortoise_config import TortoiseConfig | |
| from TTS.tts.models.tortoise import Tortoise | |
| config = TortoiseConfig() | |
| model = Tortoise.init_from_config(config) | |
| model.load_checkpoint(config, checkpoint_dir="paths/to/models_dir/", eval=True) | |
| # with random speaker | |
| output_dict = model.synthesize(text, config, speaker_id="random", extra_voice_dirs=None, **kwargs) | |
| # cloning a speaker | |
| output_dict = model.synthesize(text, config, speaker_id="speaker_n", extra_voice_dirs="path/to/speaker_n/", **kwargs) | |
| ``` | |
| Using 🐸TTS API: | |
| ```python | |
| from TTS.api import TTS | |
| tts = TTS("tts_models/en/multi-dataset/tortoise-v2") | |
| # cloning `lj` voice from `TTS/tts/utils/assets/tortoise/voices/lj` | |
| # with custom inference settings overriding defaults. | |
| tts.tts_to_file(text="Hello, my name is Manmay , how are you?", | |
| file_path="output.wav", | |
| voice_dir="path/to/tortoise/voices/dir/", | |
| speaker="lj", | |
| num_autoregressive_samples=1, | |
| diffusion_iterations=10) | |
| # Using presets with the same voice | |
| tts.tts_to_file(text="Hello, my name is Manmay , how are you?", | |
| file_path="output.wav", | |
| voice_dir="path/to/tortoise/voices/dir/", | |
| speaker="lj", | |
| preset="ultra_fast") | |
| # Random voice generation | |
| tts.tts_to_file(text="Hello, my name is Manmay , how are you?", | |
| file_path="output.wav") | |
| ``` | |
| Using 🐸TTS Command line: | |
| ```console | |
| # cloning the `lj` voice | |
| tts --model_name tts_models/en/multi-dataset/tortoise-v2 \ | |
| --text "This is an example." \ | |
| --out_path "output.wav" \ | |
| --voice_dir path/to/tortoise/voices/dir/ \ | |
| --speaker_idx "lj" \ | |
| --progress_bar True | |
| # Random voice generation | |
| tts --model_name tts_models/en/multi-dataset/tortoise-v2 \ | |
| --text "This is an example." \ | |
| --out_path "output.wav" \ | |
| --progress_bar True | |
| ``` | |
| ## Important resources & papers | |
| - Original Repo: https://github.com/neonbjb/tortoise-tts | |
| - Faster implementation: https://github.com/152334H/tortoise-tts-fast | |
| - Univnet: https://arxiv.org/abs/2106.07889 | |
| - Latent Diffusion:https://arxiv.org/abs/2112.10752 | |
| - DALL-E: https://arxiv.org/abs/2102.12092 | |
| ## TortoiseConfig | |
| ```{eval-rst} | |
| .. autoclass:: TTS.tts.configs.tortoise_config.TortoiseConfig | |
| :members: | |
| ``` | |
| ## TortoiseArgs | |
| ```{eval-rst} | |
| .. autoclass:: TTS.tts.models.tortoise.TortoiseArgs | |
| :members: | |
| ``` | |
| ## Tortoise Model | |
| ```{eval-rst} | |
| .. autoclass:: TTS.tts.models.tortoise.Tortoise | |
| :members: | |
| ``` | |