Update README.md
Browse files
README.md
CHANGED
|
@@ -32,9 +32,10 @@ datasets:
|
|
| 32 |
|
| 33 |
## Model description
|
| 34 |
|
| 35 |
-
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
| 36 |
-
The
|
| 37 |
-
|
|
|
|
| 38 |
|
| 39 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
| 40 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
|
@@ -52,10 +53,20 @@ This may be due to the sensitivity of the model in learning specific frequencies
|
|
| 52 |
|
| 53 |
### Installation
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
```bash
|
| 56 |
-
|
| 57 |
```
|
| 58 |
-
|
|
|
|
| 59 |
|
| 60 |
```bash
|
| 61 |
git clone https://github.com/projecte-aina/espeak-ng.git
|
|
@@ -72,6 +83,13 @@ pip install mecab-python3
|
|
| 72 |
pip install unidic-lite
|
| 73 |
|
| 74 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
### Generate
|
| 77 |
|
|
|
|
| 32 |
|
| 33 |
## Model description
|
| 34 |
|
| 35 |
+
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
| 36 |
+
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
| 37 |
+
And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
|
| 38 |
+
In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
|
| 39 |
|
| 40 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
| 41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
|
|
|
| 53 |
|
| 54 |
### Installation
|
| 55 |
|
| 56 |
+
This model has been trained using the espeak-ng open source text-to-speech software.
|
| 57 |
+
The espeak-ng containing the Catalan phonemizer can be found [here](https://github.com/projecte-aina/espeak-ng)
|
| 58 |
+
|
| 59 |
+
Create a virtual environment:
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
python -m venv /path/to/venv
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
```bash
|
| 66 |
+
source /path/to/venv/bin/activate
|
| 67 |
```
|
| 68 |
+
|
| 69 |
+
For training and inferencing with Catalan Matcha-TTS you need to compile the provided espeak-ng with the Catalan phonemizer:
|
| 70 |
|
| 71 |
```bash
|
| 72 |
git clone https://github.com/projecte-aina/espeak-ng.git
|
|
|
|
| 83 |
pip install unidic-lite
|
| 84 |
|
| 85 |
```
|
| 86 |
+
Install the repository:
|
| 87 |
+
|
| 88 |
+
```bash
|
| 89 |
+
pip install git+https://github.com/langtech-bsc/Matcha-TTS.git@dev-cat
|
| 90 |
+
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
|
| 94 |
### Generate
|
| 95 |
|