π«’ NextInnoMind / next_bemba_ai_medium
Multilingual Whisper ASR (Automatic Speech Recognition) Fine-tuned Whisper model for Bemba and English using language tokens. Developed and maintained by NextInnoMind, led by Chalwe Silas.
π§ͺ Model Type
WhisperForConditionalGeneration β fine-tuned using openai/whisper-medium
Framework: Transformers
Checkpoint Format: Safetensors
Languages: Bemba, English (with <|bem|> language token support)
π Model Description
This model is a Whisper Medium variant fine-tuned for Bemba and English, enabling robust multilingual transcription. It supports the use of language tokens (e.g., <|bem|>) to help guide decoding, particularly for low-resource languages like Bemba.
π Training Details
Base Model:
openai/whisper-mediumDataset:
- BembaSpeech (curated dataset of Bemba audio + transcripts)
- English subset of Common Voice
Training Time: 8 epochs (~55 hours on A100 GPU)
Learning Rate: 1e-5
Batch Size: 16
Framework: Transformers + Accelerate
Tokenizer: WhisperProcessor with
language="<|bem|>"andtask="transcribe"
π Usage
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="NextInnoMind/next_bemba_ai_medium",
chunk_length_s=30,
return_timestamps=True
)
# Example
result = pipe("path_to_audio.wav")
print(result["text"])
π Tip: For Bemba, use the language token
<|bem|>to improve transcription accuracy.
π Applications
- Multilingual Education: Bemba-English subtitles and transcription
- Broadcast & Media: Transcribe bilingual radio or TV content
- Research: Language preservation and Bantu-English linguistic studies
- Voice Accessibility: Multilingual ASR tools and captioning
β οΈ Limitations & Biases
- Slight performance drop with highly noisy or code-switched audio
- Trained on formal and clean speech; informal speech may lower accuracy
<|bem|>is required for optimal Bemba decoding
π Evaluation
| Language | WER (Word Error Rate) | Dataset |
|---|---|---|
| Bemba | ~15.2% | BembaSpeech Eval Set |
| English | ~10.5% | Common Voice EN |
π± Environmental Impact
- Hardware: A100 40GB x1
- Training Time: ~55 hours
- Carbon Emissions: Estimated ~25.8 kg COβ (via ML CO2 Impact)
π Citation
@misc{nextbembaai2025,
title={NextInnoMind next_bemba_ai_medium: Multilingual Whisper ASR model for Bemba and English},
author={Silas Chalwe and NextInnoMind},
year={2025},
howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai_medium}},
}
π§βπ» Maintainers
- Chalwe Silas (Lead Developer & Dataset Curator)
- Team NextInnoMind
π¬ Contact:
π GitHub: SilasChalwe
π Related Resources
Fine tuned in Zambia.
- Downloads last month
- 4