espnet
/

owsm_v3.1_ebf

Automatic Speech Recognition

speech-translation

Model card Files Files and versions

pyf98 commited on Feb 6

Commit

c40ab5d

·

verified ·

1 Parent(s): e4880c3

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -12,14 +12,15 @@ license: cc-by-4.0
 ## OWSM: Open Whisper-style Speech Model
-[OWSM](https://arxiv.org/abs/2309.13876) is an Open Whisper-style Speech Model from [CMU WAVLab](https://www.wavlab.org/). It reproduces Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet).
-Our demo is available [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo). The [project page](https://www.wavlab.org/activities/2024/owsm/) contains various resources.
 **[OWSM v3.1](https://arxiv.org/abs/2401.16658) is an improved version of OWSM v3. It significantly outperforms OWSM v3 in almost all evaluation benchmarks.**
 We do not include any new training data. Instead, we utilize a state-of-the-art speech encoder, [E-Branchformer](https://arxiv.org/abs/2210.00077).
-OWSM v3.1 has 1.02B parameters in total and is trained on 180k hours of public speech data.
 Specifically, it supports the following speech-to-text tasks:
 - Speech recognition
 - Any-to-any-language speech translation

 ## OWSM: Open Whisper-style Speech Model
+OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, including [ESPnet](https://github.com/espnet/espnet).
+Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
+Our demo is available [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
 **[OWSM v3.1](https://arxiv.org/abs/2401.16658) is an improved version of OWSM v3. It significantly outperforms OWSM v3 in almost all evaluation benchmarks.**
 We do not include any new training data. Instead, we utilize a state-of-the-art speech encoder, [E-Branchformer](https://arxiv.org/abs/2210.00077).
+The model in this repo has 1.02B parameters in total and is trained on 180k hours of public speech data.
 Specifically, it supports the following speech-to-text tasks:
 - Speech recognition
 - Any-to-any-language speech translation