Khubaib01 commited on
Commit
ef8dd76
Β·
verified Β·
1 Parent(s): 8f89d33

initial commit of model, readme.md, sample_audio, requirements

Browse files

![thumbnail](https://cdn-uploads.huggingface.co/production/uploads/685d281ebd8c51629778c12c/KKIP32HjIV-EFA_BclzZf.jpeg)

ECAPA_Acoustic_Domain_Classifier_README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ECAPA Acoustic Domain Classifier
2
+
3
+ ### Subtitle
4
+ **Speech, Music, and Noise Classification Using ECAPA-TDNN Embeddings**
5
+
6
+ ---
7
+
8
+ ## 🧠 Overview
9
+ This model classifies short audio clips into **Speech**, **Music**, or **Noise** domains.
10
+ It uses **ECAPA-TDNN embeddings**, a neural architecture optimized for speaker and acoustic feature representation.
11
+
12
+ Despite being trained on a **small, human-curated dataset (5 samples per class)**, the model demonstrates **high robustness and near-perfect classification**.
13
+ This project serves as a **proof-of-concept** highlighting how ECAPA embeddings can generalize even in limited-data scenarios.
14
+
15
+ ---
16
+
17
+ ## πŸ“¦ Model Information
18
+
19
+ - **Architecture:** ECAPA-TDNN
20
+ - **Framework:** PyTorch (SpeechBrain-based)
21
+ - **Input:** Mono audio waveform (16 kHz sampling rate)
22
+ - **Output Classes:** Speech | Music | Noise
23
+ - **Training Data:** 15 samples (5 per class), normalized and balanced
24
+ - **Accuracy:** 100% on internal validation (small-scale)
25
+ - **Author:** Khubaib Ahmad β€” AI/ML Engineer, Data Scientist
26
+
27
+ ---
28
+
29
+ ## βš™οΈ Methodology
30
+
31
+ 1. Extract ECAPA-TDNN embeddings for all samples using SpeechBrain.
32
+ 2. Train a simple classifier (e.g., linear or small dense network) on embeddings.
33
+ 3. Validate predictions using held-out data.
34
+ 4. Export trained model weights as `.pkl` file.
35
+
36
+ ---
37
+
38
+ ## πŸš€ Usage Example
39
+
40
+ ```python
41
+ from speechbrain.pretrained import EncoderClassifier
42
+ import torch
43
+
44
+ # Load model
45
+ model = torch.load("ECAPA_acoustic_domain_classifier.pkl", map_location="cpu")
46
+
47
+ # Example inference (pseudo code)
48
+ audio_tensor = load_audio("sample.wav") # your function to load audio as torch tensor
49
+ embedding = model.encode_batch(audio_tensor)
50
+ prediction = model.classify(embedding)
51
+ print(prediction) # -> "speech", "music", or "noise"
52
+ ```
53
+
54
+ ---
55
+
56
+ ## πŸ“‚ File Information
57
+
58
+ | File | Description |
59
+ |------|--------------|
60
+ | `ECAPA_acoustic_domain_classifier.pkl` | Trained model weights |
61
+ | `requirements.txt` | Dependencies for inference |
62
+ | `README.md` | Model documentation |
63
+ | `example_audio.mp3` | Sample audio file |
64
+
65
+ ---
66
+
67
+ ## πŸ“Š Applications
68
+
69
+ - Acoustic scene classification
70
+ - Pre-filtering for speech recognition pipelines
71
+ - Smart audio event detection
72
+ - Sound domain separation tasks
73
+
74
+ ---
75
+
76
+ ## πŸ”– Suggested Citation
77
+
78
+ ```
79
+ Muhammad Khubaib Ahmad (2025). ECAPA Acoustic Domain Classifier: Differentiating Speech, Music, and Noise using ECAPA-TDNN Embeddings. Hugging Face.
80
+ ```
81
+
82
+ ---
83
+
84
+ ## 🧾 License
85
+ MIT License β€” free for research and educational use.
ECAPA_acoustic_domain_classifier.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb06868ea2c187c8c185c2b004e948ed0105dd8988da51622c90d680b64c58b0
3
+ size 5551
example_audio.mp3 ADDED
Binary file (48.9 kB). View file
 
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ torch
2
+ speechbrain