YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🫒 NextInnoMind / next_bemba_ai_medium

Multilingual Whisper ASR (Automatic Speech Recognition) Fine-tuned Whisper model for Bemba and English using language tokens. Developed and maintained by NextInnoMind, led by Chalwe Silas.


πŸ§ͺ Model Type

WhisperForConditionalGeneration β€” fine-tuned using openai/whisper-medium Framework: Transformers Checkpoint Format: Safetensors Languages: Bemba, English (with <|bem|> language token support)


πŸ“œ Model Description

This model is a Whisper Medium variant fine-tuned for Bemba and English, enabling robust multilingual transcription. It supports the use of language tokens (e.g., <|bem|>) to help guide decoding, particularly for low-resource languages like Bemba.


πŸ“š Training Details

  • Base Model: openai/whisper-medium

  • Dataset:

    • BembaSpeech (curated dataset of Bemba audio + transcripts)
    • English subset of Common Voice
  • Training Time: 8 epochs (~55 hours on A100 GPU)

  • Learning Rate: 1e-5

  • Batch Size: 16

  • Framework: Transformers + Accelerate

  • Tokenizer: WhisperProcessor with language="<|bem|>" and task="transcribe"


πŸš€ Usage

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="NextInnoMind/next_bemba_ai_medium",
    chunk_length_s=30,
    return_timestamps=True
)

# Example
result = pipe("path_to_audio.wav")
print(result["text"])

πŸ“Œ Tip: For Bemba, use the language token <|bem|> to improve transcription accuracy.


πŸ” Applications

  • Multilingual Education: Bemba-English subtitles and transcription
  • Broadcast & Media: Transcribe bilingual radio or TV content
  • Research: Language preservation and Bantu-English linguistic studies
  • Voice Accessibility: Multilingual ASR tools and captioning

⚠️ Limitations & Biases

  • Slight performance drop with highly noisy or code-switched audio
  • Trained on formal and clean speech; informal speech may lower accuracy
  • <|bem|> is required for optimal Bemba decoding

πŸ“Š Evaluation

Language WER (Word Error Rate) Dataset
Bemba ~15.2% BembaSpeech Eval Set
English ~10.5% Common Voice EN

🌱 Environmental Impact

  • Hardware: A100 40GB x1
  • Training Time: ~55 hours
  • Carbon Emissions: Estimated ~25.8 kg COβ‚‚ (via ML CO2 Impact)

πŸ“„ Citation

@misc{nextbembaai2025,
  title={NextInnoMind next_bemba_ai_medium: Multilingual Whisper ASR model for Bemba and English},
  author={Silas Chalwe and NextInnoMind},
  year={2025},
  howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai_medium}},
}

πŸ§‘β€πŸ’» Maintainers

  • Chalwe Silas (Lead Developer & Dataset Curator)
  • Team NextInnoMind

πŸ“¬ Contact:

πŸ”— GitHub: SilasChalwe


πŸ“Œ Related Resources


Fine tuned in Zambia.

Downloads last month
4
Safetensors
Model size
0.8B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support