audiblehealthai
/

hear-pytorch

@@ -105,6 +105,8 @@ of audio, we recommend that you create a production version using [the Vertex
 Model
 Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/hear).
 ```python
 import torch
 from fcv_detector.models.hear import (
@@ -143,6 +145,97 @@ inputs = fe(raw_audio_batch, return_tensors="pt")
 output = model(**inputs)
 ```
 ### Examples
 See the following Colab notebooks for examples of how to use HeAR:

 Model
 Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/hear).
+#### Audio representation with embeddings
 ```python
 import torch
 from fcv_detector.models.hear import (
 output = model(**inputs)
 ```
+```
+You are using a model of type vit to instantiate a model of type hear. This is not supported for all configurations of models and can yield errors.
+BaseModelOutputWithPooling(last_hidden_state=tensor([[[ 0.1638,  0.0311, -0.3071,  ..., -0.1555, -0.0380, -0.3294],
+         [ 0.1879,  0.9123,  0.3434,  ...,  2.1157, -0.2212, -0.5031],
+         [ 0.4474, -0.3095,  0.1068,  ..., -1.1577, -0.1871, -1.1114],
+         ...,
+         [-1.1620, -0.6956,  0.0340,  ...,  0.2741, -0.3230, -0.7366],
+         [-0.2818, -0.1758, -0.1667,  ...,  0.3051, -0.3197, -0.6817],
+         [-0.5189, -0.3460,  0.0631,  ...,  0.2027, -0.5678, -0.2382]],
+        [[ 0.1788,  0.0652, -0.2803,  ..., -0.1490, -0.0312, -0.2837],
+         [-0.1547,  0.6340,  0.0806,  ...,  2.1374, -0.3951, -0.5316],
+         [-0.2770,  0.7531,  0.4323,  ...,  0.9180, -0.3570, -0.1897],
+         ...,
+         [-1.3322, -0.0332, -0.2455,  ...,  0.4821, -0.0645, -0.9346],
+         [-1.3276, -0.6403, -0.0455,  ...,  0.6166, -0.4472, -0.4335],
+         [-1.0610,  0.2751, -0.2439,  ...,  0.7873, -0.1567, -0.4248]],
+        [[ 0.1755,  0.1288, -0.2913,  ..., -0.1226, -0.0644, -0.3382],
+         [ 0.1055,  1.1124, -0.2281,  ...,  3.2376, -0.3979, -0.5840],
+         [-0.6490, -0.3893,  0.4327,  ...,  2.4446, -0.2480, -0.9221],
+         ...,
+         [-1.5817, -0.0733, -0.7567,  ...,  1.0221, -0.4246, -0.9694],
+         [ 0.1373, -0.0258,  0.2139,  ...,  1.2905, -0.2469, -0.8213],
+         [-1.2737,  0.2838, -0.1167,  ...,  0.8610, -0.2919, -0.8152]],
+        [[ 0.1398,  0.1110, -0.2897,  ..., -0.1562, -0.0699, -0.3052],
+         [-0.1940,  0.1297,  0.1607,  ...,  3.2720, -0.0289, -1.0005],
+         [-0.3104,  0.6009, -0.1392,  ...,  2.7523, -0.0829, -0.6996],
+         ...,
+         [-0.9739, -0.4732,  0.0499,  ...,  1.8665, -0.2438, -0.7332],
+         [-0.3944,  0.1800, -0.0829,  ...,  1.2693, -0.6084, -0.7625],
+         [-1.5253,  0.4868, -0.3012,  ...,  1.5606, -0.0050, -0.4669]]],
+       grad_fn=<NativeLayerNormBackward0>), pooler_output=tensor([[-0.3807,  0.9901,  0.5437,  ...,  1.0000,  0.5777,  0.9752],
+        [-0.4004,  0.9932,  0.7021,  ...,  1.0000,  0.7681,  0.9804],
+        [-0.3874,  0.9964,  0.5076,  ...,  1.0000,  0.8015,  0.9823],
+        [-0.3838,  0.9970,  0.5793,  ...,  1.0000,  0.8024,  0.9895]],
+       grad_fn=<TanhBackward0>), hidden_states=None, attentions=None)
+```
+#### Audio classification
+```python
+import torch
+from fcv_detector.models.hear import (
+    HearConfig,
+    HearModel,
+    HearForAudioClassification,
+    HearFeatureExtractor
+)
+from transformers import (
+    AutoConfig,
+    AutoModel,
+    AutoModelForAudioClassification,
+    AutoFeatureExtractor,
+)
+AutoConfig.register("hear", HearConfig)
+AutoModel.register(HearConfig, HearModel)
+AutoModelForAudioClassification.register(HearConfig, HearForAudioClassification)
+AutoFeatureExtractor.register(HearConfig, HearFeatureExtractor)
+from huggingface_hub.utils import HfFolder
+from huggingface_hub import notebook_login
+if HfFolder.get_token() is None:
+   notebook_login()
+model_id = "audiblehealthai/hear-pytorch"
+classifier = HearForAudioClassification.from_pretrained(model_id)
+fe = HearFeatureExtractor.from_pretrained(
+    model_id
+)
+raw_audio_batch = torch.rand((4, 32000), dtype=torch.float32)
+inputs = fe(raw_audio_batch, return_tensors="pt")
+cls_output = classifier(**inputs)
+print(cls_output)
+```
+```
+You are using a model of type vit to instantiate a model of type hear. This is not supported for all configurations of models and can yield errors.
+Some weights of HearForAudioClassification were not initialized from the model checkpoint at audiblehealthai/hear-pytorch and are newly initialized: ['classifier.layernorm.bias', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.layernorm.weight']
+You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
+SequenceClassifierOutput(loss=None, logits=tensor([[-0.0135,  0.0895],
+        [-0.0071,  0.1055],
+        [-0.0082,  0.0801],
+        [ 0.0145,  0.1028]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
+```
 ### Examples
 See the following Colab notebooks for examples of how to use HeAR: