Show HN: I made a volumetric audio visualizer (opens in new tab)

(a-sumo.github.io)

90 pointsrslice3y ago42 comments

I'm developing Hyperstep[0], a spatial language for music production. I find using existing DAWs frustrating because they don't allow me to navigate and operate intuitively on the latent spaces behind my musical ideas. This is why I've decided to build my own set of "seeing tools".(Bret Victor)[1]. I'm also convinced that by framing music as processes and interactions in the 3D world, spatialization and mixing should become fairly pain-free.

I'm still early in development and I would love to build this into an actual product that can be integrated into existing DAWs or even turn it into a musical framework itself for AR and VR experiences.

If you're interested in working on it or if you simply want to know more, feel free to contact me.

[0] https://github.com/a-sumo/hyperstep.

[1] https://www.youtube.com/watch?v=klTjiXjqHrQ

Show HN: I made a volumetric audio visualizer

(a-sumo.github.io)

90 pointsrslice3y ago42 comments

If you're interested in working on it or if you simply want to know more, feel free to contact me.

[0] https://github.com/a-sumo/hyperstep.

[1] https://www.youtube.com/watch?v=klTjiXjqHrQ

42 comments

39 comments · 17 top-level

knaekhoved3y ago· 4 in thread

What is the point of making this 3D when it seems to be circularly symmetric? Just replace the circle with a line and make it 2D, yeah?

rsliceOP3y ago

By building symmetries and lifting low-dimensional data into higher dimensional spaces(Here 3D), you can better leverage the human perceptual system. You also unlock more intuitive modes of interaction, that are closer to the real-world processes that originated the sounds.

knaekhoved3y ago

> lifting low-dimensional data into higher dimensional spaces

This is only useful if you actually do something with the extra dimensions, e.g. lifting with a kernel. Duplicating the exact same data into more dimensions is not helpful

1 more reply

IAmGraydon3y ago

It’s ok to build things just because they’re fun to look at.

knaekhoved3y ago

It's ok to ask questions about claims that posts make

techbro923y ago· 3 in thread

I agree a 3d visual would be useful for spatialization and mixing but I don’t see how it would be useful for anything else.

rsliceOP3y ago

In the repository https://github.com/a-sumo/hyperstep#organic-drums-through-ag..., I explain how one can generate drums by framing them as agent locomotion.

Moreover, music and dancing are two sides of the same coin and in styles such as hip hop and popping you have 'beat-killers'[0] who basically represent music through their motion.

If you build an appropriate spatial language, you can compose music from dancing or you can build a form of generalized dancing. I've explored this concept artistically[1],[2] and I'm currently working on formalizing it and making it computational.

[0] https://www.youtube.com/watch?v=IeNn4XG0QHo [1]https://imgur.com/gallery/pfrvIot [2]https://imgur.com/gallery/MoDVr6v

AlecSchueler3y ago

I mean mixing is a whole industry itself, why do you think that's an unsatisfactory target market?

techbro923y ago

I dont

pvg3y ago· 2 in thread

You might want to include some audio (or maybe it's there and not loading in my browser) so people who try it can see what it is right away. People also don't have to wonder what happens to their audio file to check it out.

eimrine3y ago

And in some cases like a work PC, finding a music can be challenging.

archontes3y ago

https://commons.wikimedia.org/wiki/Category:Audio_files_of_c...

1 more reply

dotancohen3y ago· 2 in thread

This is interesting. I wonder if it could be used in speech therapy, to help the speaker understand the difference between what a sound should sound like, and the sound coming out of their mouths.

I personally have problems saying two different letters differently, and I just recorded myself trying to say both letters. Sure enough, there is very little difference in the way they are both displayed. I will ask other people to say this letters, and see if I can spot the difference. Then I should be able to play around with differing lip, mouth, tongue, and larynx positions to see how close I can come.

Any tips for highlighting subtle differences in specific places would be appreciated!

rsliceOP3y ago

I would love to work on something like that, I have a couple ideas regarding ways to implement it.

However I don't know if I have the time to do it, as there are so many concepts I'm trying to channel through code. Hopefully as I build more computational tools I will increase my bandwidth.

Providing visual tools for phonetic assistance is definitely something I've had in mind for a while and solving the dataset building problem should solve that along the way.

As of now, you can: 1) set the amount_sphere_tube slider to 1 2) decrease the playback_rate to 0.1(in the GUI on the right) 3) on the media controller, go to options(the naming might vary depending on the browser) and set the playback rate to normal or x1. This should give you maximal temporal resolution.

One way features can be further highlighted by generating embeddings from magenta ddsp[0]. Speech sounds are fairly complex though so I don't know how models built on music data would generalize to them. I think tech to do is is there, but for the time being it seems to be scattered around fairly siloed fields. I also tried to use live voice recordings but there are latency issues with the subset of the Web Audio API I'm currently using. However there definitely is value in having a live, spatial feedback of pronunciation.

[0]https://github.com/magenta/magenta-js/tree/master/music#ddsp

dotancohen3y ago

Thanks for the tips.

Jot my email address down anyway, and if you ever get around to working on those speech tools I'll be thrilled to help.

dvh3y ago· 2 in thread

Works exactly once but only on short files, then I have to close browser (all windows) and restart. Chrome, Ubuntu.

rsliceOP3y ago

Thanks for the feedback, I'm investigating issues on Linux. Try opening the website on a smartphone.

dotancohen3y ago

I uploaded a short clip on Firefox, Kubuntu, and the site worked fine so far as I could tell. You are invited to contact me to help debug anything. My Gmail username is the same as my HN username.

packetlost3y ago· 2 in thread

Broke for me on FF and chrome

rsliceOP3y ago

I am working on using audio worklets to perform the audio processing without blocking the main UI thread, but support on browsers other than Chrome is still limited.

Try loading smaller samples or audio snippets.

partomniscient3y ago

Should probably put "Works best in chrome" in the project summary for the time being then. I did get it to work in Firefox somewhat with a small sample, but it'd be nice to know this stuff up front before trying. I imagine most people are going to try to load in their one of their full length favourite songs.

remedan3y ago· 2 in thread

Went to pick a song from my music library to try, only to find that the visualizer doesn't support flac.

rsliceOP3y ago

It should be fixed now.

remedan3y ago

It works! Thank you.

knaik943y ago· 1 in thread

I am not sure sure what is causing it, but it takes a solid 2 to 3 minutes on my computer before it does anything. I load a file and it feels like it freezes and firefox gave me a warning banner that the tab was causing all of firefox to slow down. Same thing is Chrome and I have a i7-10750h. Some people might mistake that for it not working since there is not UI feedback of anything happening. Windows 10.

I got two different tracks to work, and it's clear that one was a lot harder to process than the other. It took noticeably more time on the second one, to start and the CPU utilization was higher as well. They were both instrumental tracks in the same format and around the same length. The one simpler to process was the instrumental of Britney Spear's Baby One More Time. The harder one was Porter Robinson's Divinity.

Neither audio had an effect similar to the one from the demo video, but were interesting regardless. They both looked like how I imagine sound waves echo and bounce around if contained in a cube shape.

I appreciate the notebook writeup where you described the goals because the visualization wasn't inherently intuitive with the sound. I chose much more complex tones than your demo. I imagine the feature extraction is much easier on isolated sounds. This reminds me a lot of project milkdrop and so I was expecting it to be closer to that but in 3d. That was probably a misunderstanding on my part of the goals for this.

I think exposing more parameters about how features get mapped and scaled would be really helpful in making it feel more intuitive. Zooming the cube in and out is nice but didn't seem to help convey more information with the tracks I chose. If anything it got in the way because on my computer the zoom sensitivity was very very high.

I look forward to seeing where this goes.

rsliceOP3y ago

Thank you for your thoughtful comment! This is by no means a product, it's more of a way for me to test an idea and share it with the community.

This demo is meant for small audio samples. My initial goal was to use it to visualize and compare drum samples by looking at their 'spatial signature'.

Right now, I'm using arbitrarily defined 'shapes' (sphere and tube) but the goal is to recover those from real-world data. Unfortunately, building the appropriate data-set and the model to go with it is currently out of my reach but I know that's how my brain sort of learned these audio/spatial associations.

In order for this to work on complete tracks, I would need to add source separation, transcription and some form of information compression.

Additionally, I'm working on ways to deal with richer sounds, by laying them out in space or by splitting them into 'voices of unison'. Here's a demo of what it would look like: https://imgur.com/gallery/gkFPXXu

There are two directions I could take this, either making music from interacting with spatial representations or building spatial representations from music. I don't have the bandwidth to do both alone so I would love to work on it with other developers.

f_devd3y ago· 1 in thread

For those wanting a textual description: It seems to take a FFT of the audio and put it into rings (something like frequency = ring size), and those rings are then duplicated/projected to make spheres, the rings (above a certain size) are truncated to a cube shape. There is an option to have the rings move over the length of the sphere/cube so you can see the FFT over time.

redox993y ago

More like rings, it's a spherical coordinate system where radial distance is frequency, and angles don't matter. Then the brightness of the color is based on the amplitude of each frequency. Finally, the function is truncated into a cube.

tartoran3y ago· 1 in thread

Looks neat, it sort of reminds me of winamp visualization plugins. It’s probably a good learning experience but as far as utility, in a recording studio, I fail to see how this would bring anything to the table besides it just being some cool visual thing.

warvariuc3y ago

I really miss Winamp visualisation plugins. I've seen some very good versions.

nsxwolf3y ago· 1 in thread

I tried it on my iPhone with an mp3. It displayed what looked like a blue aerogel. I could zoom in and rotate but I couldn’t tell what kind of information it was trying to convey. It wasn’t playing the audio or changing with time in any way either.

rsliceOP3y ago

Just tested it on a Samsung Galaxy phone and there does seem to be an issue with longer audio files. I'm working on it, in the meantime I've added settings to reduce computational load particularly for smartphones.

naillo3y ago· 1 in thread

Worked for me :) Neat!

rsliceOP3y ago

Thanks! Glad you're enjoying it!

EZ-Cheeze3y ago

It's really accurate to the music being played

I found a slightly-to-the side isometric closely-zoomed-in view to best show the finer waves and cadences being visualized

You should provide example track/s

10/10

stuntkite3y ago

I really like this. Thank you for posting it and sharing code. It's in line with my interests and I'm going to use the heck out of it.

bobsmooth3y ago

Some visual feedback to show the file is loading would help but this is neat.

eimrine3y ago

I tried to open mp3 but nothing happened. Chrome outdated on Windows x32. Will try on a decent pc later.

davbryn3y ago

Looks good - maybe supply a demo audio file for those of us who don't have any locally?

j / k navigate · click thread line to collapse

42 comments

39 comments · 17 top-level

knaekhoved3y ago· 4 in thread

What is the point of making this 3D when it seems to be circularly symmetric? Just replace the circle with a line and make it 2D, yeah?

rsliceOP3y ago

knaekhoved3y ago

> lifting low-dimensional data into higher dimensional spaces

This is only useful if you actually do something with the extra dimensions, e.g. lifting with a kernel. Duplicating the exact same data into more dimensions is not helpful

1 more reply

IAmGraydon3y ago

It’s ok to build things just because they’re fun to look at.

knaekhoved3y ago

It's ok to ask questions about claims that posts make

techbro923y ago· 3 in thread

I agree a 3d visual would be useful for spatialization and mixing but I don’t see how it would be useful for anything else.

rsliceOP3y ago

In the repository https://github.com/a-sumo/hyperstep#organic-drums-through-ag..., I explain how one can generate drums by framing them as agent locomotion.

Moreover, music and dancing are two sides of the same coin and in styles such as hip hop and popping you have 'beat-killers'[0] who basically represent music through their motion.

[0] https://www.youtube.com/watch?v=IeNn4XG0QHo [1]https://imgur.com/gallery/pfrvIot [2]https://imgur.com/gallery/MoDVr6v

AlecSchueler3y ago

I mean mixing is a whole industry itself, why do you think that's an unsatisfactory target market?

techbro923y ago

I dont

pvg3y ago· 2 in thread

eimrine3y ago

And in some cases like a work PC, finding a music can be challenging.

archontes3y ago

https://commons.wikimedia.org/wiki/Category:Audio_files_of_c...

1 more reply

dotancohen3y ago· 2 in thread

This is interesting. I wonder if it could be used in speech therapy, to help the speaker understand the difference between what a sound should sound like, and the sound coming out of their mouths.

Any tips for highlighting subtle differences in specific places would be appreciated!

rsliceOP3y ago

I would love to work on something like that, I have a couple ideas regarding ways to implement it.

However I don't know if I have the time to do it, as there are so many concepts I'm trying to channel through code. Hopefully as I build more computational tools I will increase my bandwidth.

Providing visual tools for phonetic assistance is definitely something I've had in mind for a while and solving the dataset building problem should solve that along the way.

[0]https://github.com/magenta/magenta-js/tree/master/music#ddsp

dotancohen3y ago

Thanks for the tips.

Jot my email address down anyway, and if you ever get around to working on those speech tools I'll be thrilled to help.

dvh3y ago· 2 in thread

Works exactly once but only on short files, then I have to close browser (all windows) and restart. Chrome, Ubuntu.

rsliceOP3y ago

Thanks for the feedback, I'm investigating issues on Linux. Try opening the website on a smartphone.

dotancohen3y ago

I uploaded a short clip on Firefox, Kubuntu, and the site worked fine so far as I could tell. You are invited to contact me to help debug anything. My Gmail username is the same as my HN username.

packetlost3y ago· 2 in thread

Broke for me on FF and chrome

rsliceOP3y ago

I am working on using audio worklets to perform the audio processing without blocking the main UI thread, but support on browsers other than Chrome is still limited.

Try loading smaller samples or audio snippets.

partomniscient3y ago

remedan3y ago· 2 in thread

Went to pick a song from my music library to try, only to find that the visualizer doesn't support flac.

rsliceOP3y ago

It should be fixed now.

remedan3y ago

It works! Thank you.

knaik943y ago· 1 in thread

I look forward to seeing where this goes.

rsliceOP3y ago

Thank you for your thoughtful comment! This is by no means a product, it's more of a way for me to test an idea and share it with the community.

This demo is meant for small audio samples. My initial goal was to use it to visualize and compare drum samples by looking at their 'spatial signature'.

In order for this to work on complete tracks, I would need to add source separation, transcription and some form of information compression.

f_devd3y ago· 1 in thread

redox993y ago

tartoran3y ago· 1 in thread

warvariuc3y ago

I really miss Winamp visualisation plugins. I've seen some very good versions.

nsxwolf3y ago· 1 in thread

rsliceOP3y ago

naillo3y ago· 1 in thread

Worked for me :) Neat!

rsliceOP3y ago

Thanks! Glad you're enjoying it!

EZ-Cheeze3y ago

It's really accurate to the music being played

I found a slightly-to-the side isometric closely-zoomed-in view to best show the finer waves and cadences being visualized

You should provide example track/s

10/10

stuntkite3y ago

I really like this. Thank you for posting it and sharing code. It's in line with my interests and I'm going to use the heck out of it.

bobsmooth3y ago

Some visual feedback to show the file is loading would help but this is neat.

eimrine3y ago

I tried to open mp3 but nothing happened. Chrome outdated on Windows x32. Will try on a decent pc later.

davbryn3y ago

Looks good - maybe supply a demo audio file for those of us who don't have any locally?

j / k navigate · click thread line to collapse