initial commit of model, readme.md, sample_audio, requirements

Browse files

![thumbnail](https://cdn-uploads.huggingface.co/production/uploads/685d281ebd8c51629778c12c/KKIP32HjIV-EFA_BclzZf.jpeg)

Files changed (4) hide show

ECAPA_Acoustic_Domain_Classifier_README.md +85 -0
ECAPA_acoustic_domain_classifier.pkl +3 -0
example_audio.mp3 +0 -0
requirements.txt +2 -0

ECAPA_Acoustic_Domain_Classifier_README.md ADDED Viewed

	@@ -0,0 +1,85 @@

+# ECAPA Acoustic Domain Classifier
+### Subtitle
+**Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings**
+---
+## 🧠 Overview
+This model classifies short audio clips into **Speech**, **Music**, or **Noise** domains.
+It uses **ECAPA-TDNN embeddings**, a neural architecture optimized for speaker and acoustic feature representation.
+Despite being trained on a **small, human-curated dataset (5 samples per class)**, the model demonstrates **high robustness and near-perfect classification**.
+This project serves as a **proof-of-concept** highlighting how ECAPA embeddings can generalize even in limited-data scenarios.
+---
+## 📦 Model Information
+- **Architecture:** ECAPA-TDNN
+- **Framework:** PyTorch (SpeechBrain-based)
+- **Input:** Mono audio waveform (16 kHz sampling rate)
+- **Output Classes:** Speech | Music | Noise
+- **Training Data:** 15 samples (5 per class), normalized and balanced
+- **Accuracy:** 100% on internal validation (small-scale)
+- **Author:** Khubaib Ahmad — AI/ML Engineer, Data Scientist
+---
+## ⚙️ Methodology
+1. Extract ECAPA-TDNN embeddings for all samples using SpeechBrain.
+2. Train a simple classifier (e.g., linear or small dense network) on embeddings.
+3. Validate predictions using held-out data.
+4. Export trained model weights as `.pkl` file.
+---
+## 🚀 Usage Example
+```python
+from speechbrain.pretrained import EncoderClassifier
+import torch
+# Load model
+model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu")
+# Example inference (pseudo code)
+audio_tensor = load_audio("sample.wav")  # your function to load audio as torch tensor
+embedding = model.encode_batch(audio_tensor)
+prediction = model.classify(embedding)
+print(prediction)  # -> "speech", "music", or "noise"
+```
+---
+## 📂 File Information
+| File | Description |
+|------|--------------|
+| `ECAPA_acoustic_domain_classifier.pkl` | Trained model weights |
+| `requirements.txt` | Dependencies for inference |
+| `README.md` | Model documentation |
+| `example_audio.mp3` | Sample audio file |
+---
+## 📊 Applications
+- Acoustic scene classification
+- Pre-filtering for speech recognition pipelines
+- Smart audio event detection
+- Sound domain separation tasks
+---
+## 🔖 Suggested Citation
+```
+Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face.
+```
+---
+## 🧾 License
+MIT License — free for research and educational use.

ECAPA_acoustic_domain_classifier.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb06868ea2c187c8c185c2b004e948ed0105dd8988da51622c90d680b64c58b0
+size 5551

example_audio.mp3 ADDED Viewed

Binary file (48.9 kB). View file

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ torch
2	+ speechbrain