Analysing bird songs with Wigner transform (opens in new tab)

(soundshader.github.io)

76 pointsgbh444g5y ago22 comments

22 comments

14 comments · 5 top-level

_Microft5y ago· 6 in thread

Was this work done on data stored in lossy formats? The appendix with the talk about format conversion makes it sound like it. Should this not have been the first thing to be avoided („garbage in, …“)?

gbh444gOP5y ago

Yeah, it was mostly mp3 and ogg files. If you're suggesting flac would be better, I don't think flac adds much clarity over a decent 320 kbps mp3.

1 more reply

IndySun5y ago

>Was this work done on data stored in lossy formats?

You're right to mention this. Lossless audio is preferred for analyst software. Even good MP3s tend to top out around 16kHz.

The quality of the recording will also be dependent on the microphones used and their frequency range.

When we have analysed animal sounds it's useful to play the sample slower, pitched down. Having those higher frequencies recorded well, above 16kHz, make a huge difference to the signals information.

Gravityloss5y ago

Getting data from the real world always results in noise and artefacts etc. If you've done any kind of work with data, a large part is preprocessing. MP3 seems like a very minor source of problems compared to the potential others. Road noise, other birds, lorries backing up, people talking, mislabeling etc.

TonyTrapp5y ago

The difference however is that formats like MP3 work on the principle of psycho-acoustics, which means they modify the original audio in a way that is meant to be indistinguishable by human hearing, but it could very well cause a difference in the spectrum in places where it matters for the analysis (it can add and remove frequencies).

2 more replies

slver5y ago

On the other hand it’s not as if bird songs are encoded binary information. They’re complex to our ears, but probably hold up pretty well under common audio compression algorithms.

_Microft5y ago

Why would one bother what effects psychoacoustic models might or might not have on your data when you could simply circumvent the problem completely?

2 more replies

sdenton45y ago· 2 in thread

Couple points of gentle critique:

It's kinda hard to compare the different spectral representations when they're zoomed and cropped differently.

Spectrograms can be misleading, in a few different ways. Magnitude FFTs discard phase, which we can hear. And our eyes tend to fixate on the peaks, but the noise floor between harmonics in speech had a big impact on perceived quality. Choice of color scheme and gradient changes how we look at the spectrogram: they can emphasize mathematical or coding artifacts we wouldn't hear, or hide things which we can hear. At the end of the day, we don't hear with our eyes... So a spectrogram is a tool for looking at audio, but not always an 'honest' one. So I'm a bit suspect of pouring over slightly different spectrograms, and worrying about which ones look better aesthetically.

barbegal5y ago

In which cases can you hear phase? I know you can hear phase from interference where there are two sources but I didn't know you can hear phase of a single audio source.

willis9365y ago

Phase changes over time of single a single tone are audible. If there is a linearly changing tone and the phase is not tracking exactly (I believe it is square, I'd need to brush up on my chirp math), this will color the chirp in an audible way.

Also the relative phase of multiple tones affects what the actual shape looks like. A classic example is a square wave. Yes, it needs all odd harmonics at a sinc(f) magnitude, but it also needs all of those harmonics at specific phases.

1 more reply

gadders5y ago· 1 in thread

If you're every out and wonder which bird is behind the song you can hear, this is a great app: https://birdnet.cornell.edu/ (Android and IOS)

You record a portion of the song, and it uses machine learning to analyse it and tell you the bird with a confidence figure. Works really well.

RobinL5y ago

Birdnet is great!

Once you've identified the bird, you can then listen to a variety of additional recordings on https://www.xeno-canto.org/ which I believe is one of the sources used to train the machine learning model.

joren-5y ago

As an FYI, if you are interested in the fundamental frequency of birdsong the GitHup repo below might be of interest. It is an STFT + interpolation to get an accurate (potentially quickly changing) frequency estimate: https://github.com/JorenSix/stft_freq

DemocracyFTW5y ago

Also see https://news.ycombinator.com/item?id=27073944

j / k navigate · click thread line to collapse

22 comments

14 comments · 5 top-level

_Microft5y ago· 6 in thread

gbh444gOP5y ago

Yeah, it was mostly mp3 and ogg files. If you're suggesting flac would be better, I don't think flac adds much clarity over a decent 320 kbps mp3.

1 more reply

IndySun5y ago

>Was this work done on data stored in lossy formats?

You're right to mention this. Lossless audio is preferred for analyst software. Even good MP3s tend to top out around 16kHz.

The quality of the recording will also be dependent on the microphones used and their frequency range.

When we have analysed animal sounds it's useful to play the sample slower, pitched down. Having those higher frequencies recorded well, above 16kHz, make a huge difference to the signals information.

Gravityloss5y ago

TonyTrapp5y ago

2 more replies

slver5y ago

On the other hand it’s not as if bird songs are encoded binary information. They’re complex to our ears, but probably hold up pretty well under common audio compression algorithms.

_Microft5y ago

Why would one bother what effects psychoacoustic models might or might not have on your data when you could simply circumvent the problem completely?

2 more replies

sdenton45y ago· 2 in thread

Couple points of gentle critique:

It's kinda hard to compare the different spectral representations when they're zoomed and cropped differently.

barbegal5y ago

In which cases can you hear phase? I know you can hear phase from interference where there are two sources but I didn't know you can hear phase of a single audio source.

willis9365y ago

1 more reply

gadders5y ago· 1 in thread

If you're every out and wonder which bird is behind the song you can hear, this is a great app: https://birdnet.cornell.edu/ (Android and IOS)

You record a portion of the song, and it uses machine learning to analyse it and tell you the bird with a confidence figure. Works really well.

RobinL5y ago

Birdnet is great!

joren-5y ago

DemocracyFTW5y ago

Also see https://news.ycombinator.com/item?id=27073944

j / k navigate · click thread line to collapse