Really nice, frequently accurate (it's not perfect :-).
I wonder how many paying customers Chordify has.
It's certainly as accurate as most.
I was trying various plugins on this song: https://www.youtube.com/watch?v=e__Z-UpU01U
But so far none of the vamp plugins seem to be showing anything interesting. (I have tried a random handful)
EDIT: The spectrum view is fascinating. I can see a little bit of additional information on the edges on the Flac vs a MP3. Is there a plugin that can separate the instruments?
Could anyone familiar with this area recommend more tools like the original post and these? Would really appreciate it.
For anyone else who had never heard of "cepstrum" before, this is what I found on Wikipedia:
"The cepstrum is the result of computing the inverse Fourier transform of the logarithm of the estimated signal spectrum. The method is a tool for investigating periodic structures in frequency spectra. The power cepstrum has applications in the analysis of human speech.
The term cepstrum was derived by reversing the first four letters of spectrum. Operations on cepstra are labelled quefrency analysis (or quefrency alanysis), liftering, or cepstral analysis."
the cepstrum is awesome, it comes from the source-filter model of human speech. by looking at the periodicities in the frequencies, it attempts to capture the resonance of the filter that models the vocal tract.
I pretty much used Wikipedia as my main resource when I was learing dsp at college.
I get like 80% of my EE information from Wikipedia nowaydays, for a ton of different areas. A couple days ago I was reading Meindl's paper on boron implantation in MOSFETs and I don't recall exactly what it was, but it was such an obscure topic in the paper my colleages had quite the trouble and ultimately did not find more resources on it.
There was literally a whole Wikipedia section dedicated to the concept and it saved my goddamn ass in the presentation I had the next day.
I absolutely love Wikipedia and I owe a lot of my education to the contributors.
There's a bit of magic in the MFCC computation where you apply the discrete cosine transform (DCT). That's all about reducing correlation between components in the cepstrum and makes no sense unless you "get" the way the change of basis has high energy compaction (more information in fewer values).
However this has nothing to do with a physiological understanding of human speech.
> Note: aubio is not MIT or BSD licensed. Contact the author if you need it in your commercial product.
Does anybody have experience, using some of this code for realtime detection?
https://github.com/librosa/librosa/blob/main/librosa/beat.py
Track beats using time series input
>>> y, sr = librosa.load(librosa.ex('choice'), duration=10)
>>> tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
>>> tempo
135.99917763157896
Also see Essentia: