Comparison of state-of-the-art music source separation models on Californication (opens in new tab)

(soundcloud.com)

6 pointslapink6y ago4 comments

4 comments

4 comments · 2 top-level

ksaj6y ago· 1 in thread

I'm surprised at how much better Tasnet and especially Demucs are than Spleeter at pulling the bass guitar out.

Overall, it's obvious that there is a long ways to go in this technology. But it makes me wonder how close all this gets to how we (humans) can narrow in on a single voice in a crowd so clearly.

lapinkOP6y ago

Speech source separation has gone a long way, thanks to Yi Luo amazing work. With Dual Path RNN, he now achieves almost 20 Signal to Noise Ratio for 2 speaker separation, see [1]. This is a bit of an artificial setting though, only two speakers and they are manually mixed together. I'm not sure if there is any good dataset of speech source separation in real environments (an airport, restaurant etc).

[1]: https://arxiv.org/pdf/1910.06379.pdf

gnat6y ago· 1 in thread

Anyone have a pointer to a write-up with who did this, identifying the software being compared?

lapinkOP6y ago

Author here, this is part of the release of Demucs, you can find more information on my repo: https://github.com/facebookresearch/demucs

j / k navigate · click thread line to collapse

4 comments

4 comments · 2 top-level

ksaj6y ago· 1 in thread

I'm surprised at how much better Tasnet and especially Demucs are than Spleeter at pulling the bass guitar out.

Overall, it's obvious that there is a long ways to go in this technology. But it makes me wonder how close all this gets to how we (humans) can narrow in on a single voice in a crowd so clearly.

lapinkOP6y ago

Speech source separation has gone a long way, thanks to Yi Luo amazing work. With Dual Path RNN, he now achieves almost 20 Signal to Noise Ratio for 2 speaker separation, see [1]. This is a bit of an artificial setting though, only two speakers and they are manually mixed together. I'm not sure if there is any good dataset of speech source separation in real environments (an airport, restaurant etc).

[1]: https://arxiv.org/pdf/1910.06379.pdf

gnat6y ago· 1 in thread

Anyone have a pointer to a write-up with who did this, identifying the software being compared?

lapinkOP6y ago

Author here, this is part of the release of Demucs, you can find more information on my repo: https://github.com/facebookresearch/demucs

j / k navigate · click thread line to collapse