Spaces:
Sleeping
Sleeping
Rob Jaret
commited on
Commit
·
00e21da
1
Parent(s):
9c2487d
Updated layout, instructions, Read Me
Browse files
README.md
CHANGED
|
@@ -1,26 +1,44 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: 🌖
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: green
|
| 6 |
sdk: gradio
|
|
|
|
| 7 |
sdk_version: 5.39.0
|
| 8 |
app_file: app.py
|
|
|
|
| 9 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 13 |
|
| 14 |
-
|
| 15 |
Built using:
|
| 16 |
Mac OS Sequoia 15.5
|
| 17 |
Python 3.12
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
Outstanding questions for
|
| 23 |
-
- Since it doesn't work well when all params are compatible,
|
| 24 |
- Would it make logical sense to reshape the parameters that exist in both models but do not have the same shape so they can be averaged?
|
| 25 |
-
- Anything
|
| 26 |
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Rave Model Averager
|
| 3 |
emoji: 🌖
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
+
python_version: 3.12
|
| 8 |
sdk_version: 5.39.0
|
| 9 |
app_file: app.py
|
| 10 |
+
suggested_hardware: cpu-basic
|
| 11 |
pinned: false
|
| 12 |
+
short_description: Attempts to encode and then decode audio through an averaged version of two selected RAVE models.
|
| 13 |
+
models: Intelligent-Instruments-Lab/rave-models, shuoyang-zheng/jaspers-rave-models
|
| 14 |
+
preload_from_hub:
|
| 15 |
+
- Intelligent-Instruments-Lab/rave-models
|
| 16 |
+
- shuoyang-zheng/jaspers-rave-models]
|
| 17 |
+
tags: RAVE, Audio Model Manipulation, Encode, Decode
|
| 18 |
---
|
| 19 |
|
| 20 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 21 |
|
|
|
|
| 22 |
Built using:
|
| 23 |
Mac OS Sequoia 15.5
|
| 24 |
Python 3.12
|
| 25 |
|
| 26 |
+
This app attempts to average two RAVE models and then encode and decode an audio file through the original and averaged models.
|
| 27 |
+
|
| 28 |
+
Instructions:
|
| 29 |
+
- Select the two models from the lists of pre-trained RAVE models.
|
| 30 |
+
- Select an audio file from the ones available in the dropdown, or upload an audio file of up to 60 seconds. Click 'Submit' at the bottom of the page.
|
| 31 |
+
|
| 32 |
+
Notes:
|
| 33 |
+
- Generally, the audio encoded/decoded using the average model does not sound like an 'average' of the two models. One of the better examples comes from the default settings, where you can here the sounds from model A (multi-timbral guitar) modulated somewhat by model B (water).
|
| 34 |
+
- The versions encoded/decoded through the individual models give interesting results though.
|
| 35 |
+
- In most cases not all parameters can be averaged. They may not exist in both models, or they may not have the same shape. The data sets in the output list which ones were and weren't averaged with their shapes and any notes. (You can copy them into a spreadsheet by clicking the icon in the top right corner of each.)
|
| 36 |
+
- The averaged model starts as a clone of Model A. Parameters that can't be averaged default to Model A values.
|
| 37 |
+
- If all the parameters can be averaged, the result is usually not good - a high pitch squeal or a low rumble.
|
| 38 |
+
|
| 39 |
|
| 40 |
+
Outstanding questions for interested parties:
|
| 41 |
+
- Since it doesn't work well when all params are compatible, mayber there are some params that shouldn't be averaged?
|
| 42 |
- Would it make logical sense to reshape the parameters that exist in both models but do not have the same shape so they can be averaged?
|
| 43 |
+
- Anything thoughts on what could make the results sonically more like an 'average' of two models?
|
| 44 |
|
app.py
CHANGED
|
@@ -26,7 +26,7 @@ available_audio_files=[
|
|
| 26 |
"FrenchChildren.wav",
|
| 27 |
"Organ-ND.wav",
|
| 28 |
"SpigotsOfChateauLEtoge.wav",
|
| 29 |
-
"
|
| 30 |
"SingingBowl-OmniMic.wav",
|
| 31 |
"BirdCalls.mp3",
|
| 32 |
]
|
|
@@ -231,16 +231,16 @@ def GenerateRaveEncDecAudio(model_name_a, model_name_b, audio_file_name, audio_f
|
|
| 231 |
averaged_audio = (sr_multiplied, audio_outputs[bias].detach().numpy().squeeze())
|
| 232 |
|
| 233 |
df_averaged = pd.DataFrame(messages['keys_averaged']).transpose() #reset_index(names='Param Key')
|
| 234 |
-
df_averaged.columns=['Param Name', 'Model A Shape', 'Model B Shape', '
|
| 235 |
|
| 236 |
df_not_averaged = pd.DataFrame(messages["keys_not_averaged"]).transpose()
|
| 237 |
|
| 238 |
# case when all params are averaged
|
| 239 |
if len(df_not_averaged.columns) == 0:
|
| 240 |
-
data = {'Param Name': [], 'Modeal A Shape': [], 'Model B Shape': [], '
|
| 241 |
df_not_averaged = pd.DataFrame(data)
|
| 242 |
|
| 243 |
-
df_not_averaged.columns=['Param Name', 'Model A Shape', 'Model B Shape', '
|
| 244 |
|
| 245 |
messages["stats"] = f"Model A: {model_name_a}\nModel B: {model_name_b}\nAudio file: {os.path.basename(audio_file)}\nSample Rate Multiple for Averaged Version: {sr_multiple}\n\n" + messages["stats"]
|
| 246 |
|
|
@@ -253,11 +253,11 @@ waveform_options = gr.WaveformOptions(waveform_color="#01C6FF",
|
|
| 253 |
waveform_progress_color="#0066B4",
|
| 254 |
skip_length=2,)
|
| 255 |
|
| 256 |
-
description = "<p style='line-height:
|
| 257 |
-
"<ul
|
| 258 |
-
"<li style='line-height: 1'>Select an audio file from the ones available in the dropdown, or upload an audio file of up to 60 seconds. Click 'Submit'
|
| 259 |
-
"<p style='line-height: 1'>Note that in most cases not all parameters can be averaged. They may not exist in both models
|
| 260 |
-
|
| 261 |
"<!-- <li>Select a sample rate multiple for the averaged model. When there is a useful result, it sometimes sounds better at double the sample rate.</li>" \
|
| 262 |
"<li>Select a bias towards one of the models. A bias of 0 will average the two models equally. A positive bias will favor Model A, and a negative bias will favor Model B.</li></ul>" \
|
| 263 |
"-->"
|
|
@@ -271,10 +271,10 @@ AverageModels = gr.Interface(title="Process Audio Through the Average of Two Rav
|
|
| 271 |
gr.Radio(model_path_config_keys, label="Select Model B", value="Water", container=True),
|
| 272 |
gr.Dropdown(available_audio_files, label="Select from these audio files or upload your own below:", value="SilverCaneAbbey-Voices.wav",container=True),
|
| 273 |
gr.Audio(label="Upload an audio file (wav)", type="filepath", sources=["upload", "microphone"], max_length=60,
|
| 274 |
-
waveform_options=waveform_options, format='wav'),
|
|
|
|
| 275 |
gr.Radio([.2, .5, .75, 1, 2, 4], label="Sample Rate Multiple (Averaged version only)", value=1, container=True),
|
| 276 |
gr.Slider(label="Bias towards Model A or B", minimum=-1, maximum=1, value=0, step=0.1, container=True),
|
| 277 |
-
|
| 278 |
],
|
| 279 |
# if no way to pass dictionary, pass separate keys and values and zip them.
|
| 280 |
outputs=[
|
|
@@ -283,8 +283,8 @@ AverageModels = gr.Interface(title="Process Audio Through the Average of Two Rav
|
|
| 283 |
gr.Audio(label="Encoded/Decoded through Model B", sources=None, waveform_options=waveform_options,),
|
| 284 |
gr.Audio(label="Encoded/Decoded through averaged model", sources=None, waveform_options=waveform_options,),
|
| 285 |
gr.Textbox(label="Info:"),
|
| 286 |
-
gr.Dataframe(label="Params Averaged", show_copy_button="True", scale=100, column_widths=column_widths, headers=['Param Name', 'Model A Shape', 'Model B Shape', '
|
| 287 |
-
gr.Dataframe(label="Params Not Averaged", show_copy_button="True", scale=100, column_widths=column_widths, headers=['Param Name', 'Model A Shape', 'Model B Shape', '
|
| 288 |
]
|
| 289 |
,fill_width=True
|
| 290 |
)
|
|
|
|
| 26 |
"FrenchChildren.wav",
|
| 27 |
"Organ-ND.wav",
|
| 28 |
"SpigotsOfChateauLEtoge.wav",
|
| 29 |
+
"GesturesPercStrings.wav",
|
| 30 |
"SingingBowl-OmniMic.wav",
|
| 31 |
"BirdCalls.mp3",
|
| 32 |
]
|
|
|
|
| 231 |
averaged_audio = (sr_multiplied, audio_outputs[bias].detach().numpy().squeeze())
|
| 232 |
|
| 233 |
df_averaged = pd.DataFrame(messages['keys_averaged']).transpose() #reset_index(names='Param Key')
|
| 234 |
+
df_averaged.columns=['Param Name', 'Model A Shape', 'Model B Shape', 'Notes']
|
| 235 |
|
| 236 |
df_not_averaged = pd.DataFrame(messages["keys_not_averaged"]).transpose()
|
| 237 |
|
| 238 |
# case when all params are averaged
|
| 239 |
if len(df_not_averaged.columns) == 0:
|
| 240 |
+
data = {'Param Name': [], 'Modeal A Shape': [], 'Model B Shape': [], 'Notes': []}
|
| 241 |
df_not_averaged = pd.DataFrame(data)
|
| 242 |
|
| 243 |
+
df_not_averaged.columns=['Param Name', 'Model A Shape', 'Model B Shape', 'Notes']
|
| 244 |
|
| 245 |
messages["stats"] = f"Model A: {model_name_a}\nModel B: {model_name_b}\nAudio file: {os.path.basename(audio_file)}\nSample Rate Multiple for Averaged Version: {sr_multiple}\n\n" + messages["stats"]
|
| 246 |
|
|
|
|
| 253 |
waveform_progress_color="#0066B4",
|
| 254 |
skip_length=2,)
|
| 255 |
|
| 256 |
+
description = "<p style='line-height: 1'>This app attempts to average two RAVE models and then encode and decode an audio file through the original and averaged models.</p>" \
|
| 257 |
+
"<ul style='padding-bottom: 0px'>Instructions:<li style='line-height: 1; padding-top: 5px'>Select the two models from the lists of pre-trained RAVE models.</li>" \
|
| 258 |
+
"<li style='line-height: 1; padding-top: 0px'>Select an audio file from the ones available in the dropdown, or upload an audio file of up to 60 seconds. Click 'Submit' at the bottom of the page.</li></ul>"\
|
| 259 |
+
"<p style='line-height: 1.2; padding-top: 0px; margin-top: 3px;'>Note that in most cases not all parameters can be averaged. They may not exist in both models or the two values may not have the same shape. The data sets in the output list which ones were and weren't averaged with their shapes and any notes. (You can copy them into a spreadsheet by clicking the icon at the top right corner of each.)</p>"
|
| 260 |
+
|
| 261 |
"<!-- <li>Select a sample rate multiple for the averaged model. When there is a useful result, it sometimes sounds better at double the sample rate.</li>" \
|
| 262 |
"<li>Select a bias towards one of the models. A bias of 0 will average the two models equally. A positive bias will favor Model A, and a negative bias will favor Model B.</li></ul>" \
|
| 263 |
"-->"
|
|
|
|
| 271 |
gr.Radio(model_path_config_keys, label="Select Model B", value="Water", container=True),
|
| 272 |
gr.Dropdown(available_audio_files, label="Select from these audio files or upload your own below:", value="SilverCaneAbbey-Voices.wav",container=True),
|
| 273 |
gr.Audio(label="Upload an audio file (wav)", type="filepath", sources=["upload", "microphone"], max_length=60,
|
| 274 |
+
waveform_options=waveform_options, format='wav'),],
|
| 275 |
+
additional_inputs=[
|
| 276 |
gr.Radio([.2, .5, .75, 1, 2, 4], label="Sample Rate Multiple (Averaged version only)", value=1, container=True),
|
| 277 |
gr.Slider(label="Bias towards Model A or B", minimum=-1, maximum=1, value=0, step=0.1, container=True),
|
|
|
|
| 278 |
],
|
| 279 |
# if no way to pass dictionary, pass separate keys and values and zip them.
|
| 280 |
outputs=[
|
|
|
|
| 283 |
gr.Audio(label="Encoded/Decoded through Model B", sources=None, waveform_options=waveform_options,),
|
| 284 |
gr.Audio(label="Encoded/Decoded through averaged model", sources=None, waveform_options=waveform_options,),
|
| 285 |
gr.Textbox(label="Info:"),
|
| 286 |
+
gr.Dataframe(label="Params Averaged", show_copy_button="True", scale=100, column_widths=column_widths, headers=['Param Name', 'Model A Shape', 'Model B Shape', 'Notes']),
|
| 287 |
+
gr.Dataframe(label="Params Not Averaged", show_copy_button="True", scale=100, column_widths=column_widths, headers=['Param Name', 'Model A Shape', 'Model B Shape', 'Notes'])
|
| 288 |
]
|
| 289 |
,fill_width=True
|
| 290 |
)
|