Spleeter – Music Source-Separation Engine (opens in new tab)

(deezer.io)

258 pointsjph986y ago63 comments

63 comments

61 comments · 26 top-level

roddylindsay6y ago· 5 in thread

Once this technology gets incorporated into DJ mixers / CDJs, this is going to make DJing much more creatively interesting.

Historically, blending between mixed stereo tracks has limited to mixing EQ bands, but now DJs will be able to layer and mix the underlying stems themselves -- like putting the vocal from one track onto an instrumental section on another (even if there were never a capella / instrumental versions released.)

It also opens up a previously unreachable world for amateur remixing in general; for instance, creating surround sound mixes from stereo or even mono recordings for playback in 3D audio environments like Envelop (https://envelop.us) [disclaimer: I am one of the co-founders of Envelop]

pgt6y ago

Disclaimer: my hobby is correcting people on the Internet when they say disclaimer but they really mean disclosure :)

hobofan6y ago

How are those windmills doing, Don Quijote?

1 more reply

Juliate6y ago

Wouldn't it be much more efficient for everyone (and even lucrative for the owners) to also provide the studio stems at a slightly higher/different price?

(not that some of these are not already available when you know where to search, but it's not very... structured)

marksomnian6y ago

Native Instruments tried that with their Stems [0] project. Didn't seem to get all too far though.

[0]: https://www.native-instruments.com/en/specials/stems/

1 more reply

NobodyNada6y ago

This is a thing specifically in the contemporary Christian music industry, so that churches can pick-and-choose parts from the original song to use as backing tracks for live performance. See e.g. https://www.multitracks.com/songs/Hillsong-Young-And-Free/Th...

svat6y ago· 5 in thread

I often have voice recordings with a lot of background noise (e.g. a public lecture in a room with poor acoustics, recorded from a phone in the audience — there's usually sounds of paper rustling, noises from the street, etc). Is this "source-separation" the sort of thing that could help, or does anyone have other tips? The best thing I have so far is based on this https://wiki.audacityteam.org/wiki/Sanitizing_speech_recordi... —

(1) Open the file in Audacity and switch to Spectrogram view, (2) set a high-pass filter with ~150 Hz, i.e. filter out frequencies lower than that (which tend to be loud anyway), (3) don’t remove the higher frequencies (which aren’t loud), because they are what make the consonants understandable (apparently), (4) look for specific noises, select the rectangle, and use “Spectral Edit Multi Tool”.

But if machine learning can help that would be really interesting! This Spleeter page does mention “active listening, educational purposes, […] transcription” so I'm excited.

SyneRyder6y ago

I'd generally try iZotope RX for cleaning up audio - Dialogue Isolate is probably the exact feature you would want (and I gather is often used in movies to clean up on location dialogue), but it's only in the most expensive Advanced version:

https://www.izotope.com/en/products/rx/features/dialogue-iso...

Cheaper versions of RX still have various noise reduction tools, de-verb for reducing reverb and room echo, and a range of spectral editing tools as well.

Guillaume866y ago

You could give a shot to the Nvidia RTX Voice plugin if you have one of the compatible cards. I'm not sure how it deals with low background noises, the youtube reviews mostly tested it with over the top cases like a vacuum cleaner next to the speaker.

Cactus20186y ago

Works on non-RTX cards too https://arstechnica.com/gaming/2020/04/you-can-get-nvidias-r...

nathan_f776y ago

https://krisp.ai uses machine learning to remove background noise. I've used them with Zoom calls and it works really well. I think they don't currently have an "upload audio" feature for existing recordings, but it would be awesome if they offered this in the future.

Sorry it's not something you can use now, but I just thought I would mention it! I also did a quick Google search but unfortunately I couldn't find any AI noise removal tools that might solve this problem.

LegitShady6y ago

Is the processing happening remotely? I can't use any software that sends data (especially communications) of premises.

SyneRyder6y ago· 4 in thread

For anyone who wants to try Spleeter in a version that "just works" without having to install TensorFlow and mess with offline processing, Spleeter has been been built into a wave editor called Acoustica from Acon Digital. It's been working really well for me, and the whole package is solid competition to editors like iZotope RX:

https://acondigital.com/products/acoustica-audio-editor/

theobr6y ago

I've been trying for months to make redistributable Spleeter "binaries" that I can bundle with user-facing applications. Happy to see someone's succeeded where I've failed. Really sad they've chosen not to share their changes :(

I emailed them requesting more info on how their implementation works. I think this might be a violation of the MIT license?

SyneRyder6y ago

The MIT license isn't copyleft, there's no obligation to share modifications or provide source - just to acknowledge the copyright / credit.

But they seem friendly and proactive (from my experience on music forums anyway), so hopefully you'll get a helpful reply.

philipov6y ago

Does Acoustica offer full ability to rebind the hotkeys?

SyneRyder6y ago

It's something they're working on. You can already change most keyboard shortcuts, but there's a few corner cases that people have been asking for (shortcuts with arrow keys are a problem at the moment). The developers have been extremely responsive to feature requests on the Gearslutz forum though, I've seen some feature requests implemented in just a few days:

https://www.gearslutz.com/board/product-alerts-older-than-2-...

FraKtus6y ago· 4 in thread

It says it can be 100 times faster than in real-time.

So can it be run in real-time?

I am thinking about extracting features for music visualization but it could make a DJ happy also.

kleiba6y ago

Sometimes the distinction is made between "real-time" and "online" processing.

The first one refers to the speed of the processing in relation to the length of the recording - so, say, you can process a 10 minute recording in 1 minute then you're 10x real-time. However, your analysis might require the full track to be available for best outcomes, and so you cannot really start with the processing until the full source is available.

The latter is what "online" processing refers-to, the ability to process on-the-fly in parallel to the recording. Obviously, this cannot be faster than real-time ;-) but hopefully it is not slower, either. Often times, though, you get a (somewhat constant and) hopefully slow offset, i.e., you can process a 10 minute recording online in the same time but you need another 10 seconds on top of that.

This is, by the way, not restricted to source separation, it applies to other disciplines as well, say, automatic speech recognition.

FraKtus6y ago

Exactly, while fast if this method needs to parse the full track before starting to generate the results then it can't be used in real-tine.

To be used with arbitrary audio in real-time, after initialization and setup you need an API that looks like:

ProcessAudio (samples, num_namples)

And it would return n packets of num_namples samples.

nunja6y ago

I experimented with the spleeter architecture quite a bit and I would say this is not suitable for real time audio processing. The reason is that the model needs at least 512 frames of audio samples to produce an output usable for source separation. This adds a ton of latency. I tried with smaller windows but the results are very bad.

LegitShady6y ago

This person https://github.com/diracdeltas/spleeter4max

created an max for live native version of spleeter and demos it here:

https://www.youtube.com/watch?v=4pcJoI5CUOA&feature=youtu.be

It's way faster than real time, im not sure why slowing it down would be an advantage. You still need to take the resultant data and do things with them, as a dj, and faster is better.

Myce6y ago· 3 in thread

A local radiostation has a broadcast of four hours. They are required to play an x amount of music tracks by the station (about 6 per hour), but there has been demand to make the broadcast available as podcast without the music.

Could this make it possible to automatically remove the music from the MP3 file they have available? With 6 tracks per hour times 4 hours, manually removing the music is time consuming.

I doubt it, as it seems all vocals are are output to a single file...

Is there any other tool someone can recommend?

lozf6y ago

Presumably they own the rights on broadcast material, so they'd have to be directly involved in the podcast production. That given, it would probably be more straight forward to take the microphone feeds from their broadcast desk (via "aux-out" perhaps) and record only the spoken output separately.

Sox etc. could be used for silence detection, probably best done in post (scriptable), but could be piped through after experimenting with settings. Otherwise, even old desks can trigger when a mic channel fader is raised, so that too is a possibility for pausing the recording during music.

TedDoesntTalk6y ago

> Is there any other tool someone can recommend?

Audacity. I can think of two ways.

0. Import into audacity

1. Playback the recording at 4x (1 hour of playback for 4 hours real-time broadcast). Mark the edges where music stops and starts. You have to do that 12 times for 6 songs. Youll have to slow down near the changes in order catch the precise time of an edge. Delete the music between the two edges. Repeat 5 more times.

There may be audacity plugins that do what you want or do something closer to it.

2.use some combination of low pass and high pass filters to remove the music. It's not going to be perfect and you'll still need to edit out the filtered music anyway.

fao_6y ago

At this point it'd be easier to just duplicate the sources to an external recorder, right?

TheOtherHobbes6y ago· 3 in thread

Out of interest, and to put this in context - your brain can only do this for conversation, not music.

You routinely suppress background noise and room acoustics when listening to someone speaking. But you don't do the same thing when listening to music. At best you can focus on individual elements in a track, and you can parse them musically (and maybe lyrically).

But you don't suppress the rest to the point where you don't hear it.

maw6y ago

Maybe _your_ brain can, but mine can't.

To somebody with APD, it sounds like science fiction, although it does require more suspension of disbelief than faster than light travel or teleportation.

michaelcampbell6y ago

Is there some research behind this or are you opining on how YOUR brain works?

aasasd6y ago

https://en.wikipedia.org/wiki/Cocktail_party_effect

Not sure how this pertains to music, but this ability normally requires localizing different voices and noises.

iseanstevens6y ago· 2 in thread

There is a Max/Ableton live plugin version here, which makes it much easier to experiment with Spleeter artistically.

https://github.com/diracdeltas/spleeter4max/releases/

tomduncalf6y ago

Nice! I had this idea and was too lazy to do it haha, glad someone else wasn’t

tsukurimashou6y ago

needs a reaper version as well!

tomduncalf6y ago· 2 in thread

Another recent open source contender for source separation is Open Unmix: https://github.com/sigsep/open-unmix-pytorch/

I’ve not had time to try it yet but have read good things.

tomduncalf6y ago

Just tried this and it's really impressive, I'd say it does a nicer job on vocals than Spleeter. Less of the "underwater" effect compared to what I remember of Spleeter.

philipov6y ago

Unfortunately, it doesn't look like they've got out-of-the-box windows support.

voiper16y ago· 2 in thread

Very cool!

I was even able to run it on their notebook https://colab.research.google.com/github/deezer/spleeter/blo... without setting anything up locally.

The results of vocal separation were quite impressive.

antman6y ago

Can you share the outputs please?

voiper16y ago

Sorry, I tried it on a (typical) copyrighted song.

TedDoesntTalk6y ago· 2 in thread

Now we can create all-star bands that never existed. For example:

Neil schon from journey. Lead guitar

Heart sisters doing lead vocals and lead/rthyum guitar

Flea -- bass guitar from Chili Peppers

Neal Peart -- drummer from rush

Tony kay --- keys from genesis

The only difficulty is they must all be playing the same song. Then we can extract, transpose if needed, and remix together.

floatrock6y ago

We can deep fake vocals and redraw your photos as if they were painted by van gogh... I'm sure someone has trained something that immortalizes different artists into their AI instrumental avatar.

If not, I'm sure if you ask nicely Amazon will give you a few credits to burn on a pandemic art project.

quattrofan6y ago

Tony Banks?

mwcampbell6y ago· 1 in thread

Previous discussion, where I posted a demo using a full song (legally under Creative Commons):

https://news.ycombinator.com/item?id=21431071

Note: I'm not affiliated with this project; I just think it's cool.

dang6y ago

I'm going to pretend that we didn't see this (otherwise extremely helpful) link to a major discussion from 6 months ago, so as not to have to mark the current post a dupe.

grawprog6y ago· 1 in thread

I couldn't find any examples so was wondering for anyone that's tried this are the results better than using a bandpass filter and an equalizer to isolate frequencies or one of those auto karaoke things?

Because the ability to separate any song into separate tracks would be amazing. The ability to remix any song or just play with any instrument or vocal track would be awesome. But does it have the same poor quality and limitations of most frequency based source separation?

tomduncalf6y ago

Yeah, the results are a lot better than filtering... deep learning has pushed the state of the art in source separation on quite a lot recently.

It isn’t magical and the results still have artefacts (mostly that kind of slightly underwater sound of a low bitrate MP3, I believe due to the way the audio is reconstructed from FFTs), and some songs trip it up entirely, but it’s definitely worth playing around with and I think it could potentially have applications for DJ/remix use if you added enough effects etc.

It’s fairly easy to install and runs quickly without GPU, or you can try their Collab notebook, or seems someone has hosted a version at https://ezstems.com/

fold_left6y ago· 1 in thread

Once you have obtained just the Guitar from a track, are there any tools out there which can work out the Tablature (eg. https://www.ultimate-guitar.com//top/tabs) so you can play along?

dspig6y ago

https://products.zplane.de/decoda can help

leoncvlt6y ago

Here's the sample output, for those who are curious:

- Sample track: https://files.catbox.moe/56op27.mp3

- Spleeted vocals: https://files.catbox.moe/4d9aru.wav

- Spleeted accompaniment: https://files.catbox.moe/y67g23.wav

jph98OP6y ago

Leveraging a state-of-the-art source separation algorithm for music information retrieval

https://www.youtube.com/watch?time_continue=42&v=JIR6HJISrtY...

marksomnian6y ago

Had a play with the Colab and it's quite good indeed. The authors claim "100x real time speed", which is mighty impressive, but I'd be more interested in seeing a "Try Really Hard" mode, trading off quality and speed. Is that a thing that can be done in the current code, I wonder?

mehrdadn6y ago

If you're trying to run it on Windows with Python 3.8, add numpy and cython to the dependencies, and change Tensorflow's requirement to be >= rather than ==.

Though then you'll run into compatibility errors like "No module named 'tensorflow.contrib'" which you'll have to fix.

mbushey6y ago

While this is awesome, it's trained on MUSDB18-HQ which as far as I can tell is proprietary. zenodo.org claims it is available, however I have filled out their "request access" page a half-dozen times. Does anyone know of a training data-set that's possible to obtain?

Here is the zenodo response:

Your access request has been rejected by the record owner.

Message from owner: no justification given

Record: MUSDB18-HQ - an uncompressed version of MUSDB18 https://zenodo.org/record/3338373

The decision to reject the request is solely under the responsibility of the record owner. Hence, please note that Zenodo staff are not involved in this decision.

pabs36y ago

This reminds me of this open source project (and its predecessor manyears and open hardware projects 8/16soundsusb).

https://github.com/introlab/odas https://github.com/introlab/manyears https://github.com/introlab/16SoundsUSB

Website of the team behind these:

https://introlab.3it.usherbrooke.ca/

InstaHeads6y ago

Well, it seems neural networks started to appear for vocal and instrumental track isolation^^ recently I've discovered https://www.lalal.ai and it works quite well

philipov6y ago

I tried using the 2 stem model to remove the music from an audio recording of two people talking. It kept sucking in some of the music whenever someone started talking, however. Is there a better model to use for that?

manceraio6y ago

You could try spleeter on the cloud here https://voxremover.com

philipov6y ago

The output appears to cut off after 10 minutes. How do you make it operate on longer files, like in the 100 minute range?

jbverschoor6y ago

Deezer is pretty useless if all supported hardware require your phone to stream.

They should spend dev time on something that matters

peterhookgen6y ago

This is very cool, I have started using it for experimenting creating hardstyle dance remixes of popular songs

fit2rule6y ago

This is ultra-cool .. I have a few terabytes of jam-session recordings that I'm going to throw at this. If it ends up being usable to the point that I can re-do vocals over some of the greatest moments in the archive, I'll be praising whatever Spleeter deity makes itself visible to me at the time, most highly ..

j / k navigate · click thread line to collapse

63 comments

61 comments · 26 top-level

roddylindsay6y ago· 5 in thread

Once this technology gets incorporated into DJ mixers / CDJs, this is going to make DJing much more creatively interesting.

pgt6y ago

Disclaimer: my hobby is correcting people on the Internet when they say disclaimer but they really mean disclosure :)

hobofan6y ago

How are those windmills doing, Don Quijote?

1 more reply

Juliate6y ago

Wouldn't it be much more efficient for everyone (and even lucrative for the owners) to also provide the studio stems at a slightly higher/different price?

(not that some of these are not already available when you know where to search, but it's not very... structured)

marksomnian6y ago

Native Instruments tried that with their Stems [0] project. Didn't seem to get all too far though.

[0]: https://www.native-instruments.com/en/specials/stems/

1 more reply

NobodyNada6y ago

svat6y ago· 5 in thread

But if machine learning can help that would be really interesting! This Spleeter page does mention “active listening, educational purposes, […] transcription” so I'm excited.

SyneRyder6y ago

https://www.izotope.com/en/products/rx/features/dialogue-iso...

Cheaper versions of RX still have various noise reduction tools, de-verb for reducing reverb and room echo, and a range of spectral editing tools as well.

Guillaume866y ago

Cactus20186y ago

Works on non-RTX cards too https://arstechnica.com/gaming/2020/04/you-can-get-nvidias-r...

nathan_f776y ago

LegitShady6y ago

Is the processing happening remotely? I can't use any software that sends data (especially communications) of premises.

SyneRyder6y ago· 4 in thread

https://acondigital.com/products/acoustica-audio-editor/

theobr6y ago

I emailed them requesting more info on how their implementation works. I think this might be a violation of the MIT license?

SyneRyder6y ago

The MIT license isn't copyleft, there's no obligation to share modifications or provide source - just to acknowledge the copyright / credit.

But they seem friendly and proactive (from my experience on music forums anyway), so hopefully you'll get a helpful reply.

philipov6y ago

Does Acoustica offer full ability to rebind the hotkeys?

SyneRyder6y ago

https://www.gearslutz.com/board/product-alerts-older-than-2-...

FraKtus6y ago· 4 in thread

It says it can be 100 times faster than in real-time.

So can it be run in real-time?

I am thinking about extracting features for music visualization but it could make a DJ happy also.

kleiba6y ago

Sometimes the distinction is made between "real-time" and "online" processing.

This is, by the way, not restricted to source separation, it applies to other disciplines as well, say, automatic speech recognition.

FraKtus6y ago

Exactly, while fast if this method needs to parse the full track before starting to generate the results then it can't be used in real-tine.

To be used with arbitrary audio in real-time, after initialization and setup you need an API that looks like:

ProcessAudio (samples, num_namples)

And it would return n packets of num_namples samples.

nunja6y ago

LegitShady6y ago

This person https://github.com/diracdeltas/spleeter4max

created an max for live native version of spleeter and demos it here:

https://www.youtube.com/watch?v=4pcJoI5CUOA&feature=youtu.be

It's way faster than real time, im not sure why slowing it down would be an advantage. You still need to take the resultant data and do things with them, as a dj, and faster is better.

Myce6y ago· 3 in thread

Could this make it possible to automatically remove the music from the MP3 file they have available? With 6 tracks per hour times 4 hours, manually removing the music is time consuming.

I doubt it, as it seems all vocals are are output to a single file...

Is there any other tool someone can recommend?

lozf6y ago

TedDoesntTalk6y ago

> Is there any other tool someone can recommend?

Audacity. I can think of two ways.

0. Import into audacity

There may be audacity plugins that do what you want or do something closer to it.

2.use some combination of low pass and high pass filters to remove the music. It's not going to be perfect and you'll still need to edit out the filtered music anyway.

fao_6y ago

At this point it'd be easier to just duplicate the sources to an external recorder, right?

TheOtherHobbes6y ago· 3 in thread

Out of interest, and to put this in context - your brain can only do this for conversation, not music.

But you don't suppress the rest to the point where you don't hear it.

maw6y ago

Maybe _your_ brain can, but mine can't.

To somebody with APD, it sounds like science fiction, although it does require more suspension of disbelief than faster than light travel or teleportation.

michaelcampbell6y ago

Is there some research behind this or are you opining on how YOUR brain works?

aasasd6y ago

https://en.wikipedia.org/wiki/Cocktail_party_effect

Not sure how this pertains to music, but this ability normally requires localizing different voices and noises.

iseanstevens6y ago· 2 in thread

There is a Max/Ableton live plugin version here, which makes it much easier to experiment with Spleeter artistically.

https://github.com/diracdeltas/spleeter4max/releases/

tomduncalf6y ago

Nice! I had this idea and was too lazy to do it haha, glad someone else wasn’t

tsukurimashou6y ago

needs a reaper version as well!

tomduncalf6y ago· 2 in thread

Another recent open source contender for source separation is Open Unmix: https://github.com/sigsep/open-unmix-pytorch/

I’ve not had time to try it yet but have read good things.

tomduncalf6y ago

Just tried this and it's really impressive, I'd say it does a nicer job on vocals than Spleeter. Less of the "underwater" effect compared to what I remember of Spleeter.

philipov6y ago

Unfortunately, it doesn't look like they've got out-of-the-box windows support.

voiper16y ago· 2 in thread

Very cool!

I was even able to run it on their notebook https://colab.research.google.com/github/deezer/spleeter/blo... without setting anything up locally.

The results of vocal separation were quite impressive.

antman6y ago

Can you share the outputs please?

voiper16y ago

Sorry, I tried it on a (typical) copyrighted song.

TedDoesntTalk6y ago· 2 in thread

Now we can create all-star bands that never existed. For example:

Neil schon from journey. Lead guitar

Heart sisters doing lead vocals and lead/rthyum guitar

Flea -- bass guitar from Chili Peppers

Neal Peart -- drummer from rush

Tony kay --- keys from genesis

The only difficulty is they must all be playing the same song. Then we can extract, transpose if needed, and remix together.

floatrock6y ago

We can deep fake vocals and redraw your photos as if they were painted by van gogh... I'm sure someone has trained something that immortalizes different artists into their AI instrumental avatar.

If not, I'm sure if you ask nicely Amazon will give you a few credits to burn on a pandemic art project.

quattrofan6y ago

Tony Banks?

mwcampbell6y ago· 1 in thread

Previous discussion, where I posted a demo using a full song (legally under Creative Commons):

https://news.ycombinator.com/item?id=21431071

Note: I'm not affiliated with this project; I just think it's cool.

dang6y ago

I'm going to pretend that we didn't see this (otherwise extremely helpful) link to a major discussion from 6 months ago, so as not to have to mark the current post a dupe.

grawprog6y ago· 1 in thread

tomduncalf6y ago

Yeah, the results are a lot better than filtering... deep learning has pushed the state of the art in source separation on quite a lot recently.

It’s fairly easy to install and runs quickly without GPU, or you can try their Collab notebook, or seems someone has hosted a version at https://ezstems.com/

fold_left6y ago· 1 in thread

Once you have obtained just the Guitar from a track, are there any tools out there which can work out the Tablature (eg. https://www.ultimate-guitar.com//top/tabs) so you can play along?

dspig6y ago

https://products.zplane.de/decoda can help

leoncvlt6y ago

Here's the sample output, for those who are curious:

- Sample track: https://files.catbox.moe/56op27.mp3

- Spleeted vocals: https://files.catbox.moe/4d9aru.wav

- Spleeted accompaniment: https://files.catbox.moe/y67g23.wav

jph98OP6y ago

Leveraging a state-of-the-art source separation algorithm for music information retrieval

https://www.youtube.com/watch?time_continue=42&v=JIR6HJISrtY...

marksomnian6y ago

mehrdadn6y ago

If you're trying to run it on Windows with Python 3.8, add numpy and cython to the dependencies, and change Tensorflow's requirement to be >= rather than ==.

Though then you'll run into compatibility errors like "No module named 'tensorflow.contrib'" which you'll have to fix.

mbushey6y ago

Here is the zenodo response:

Your access request has been rejected by the record owner.

Message from owner: no justification given

Record: MUSDB18-HQ - an uncompressed version of MUSDB18 https://zenodo.org/record/3338373

The decision to reject the request is solely under the responsibility of the record owner. Hence, please note that Zenodo staff are not involved in this decision.

pabs36y ago

This reminds me of this open source project (and its predecessor manyears and open hardware projects 8/16soundsusb).

https://github.com/introlab/odas https://github.com/introlab/manyears https://github.com/introlab/16SoundsUSB

Website of the team behind these:

https://introlab.3it.usherbrooke.ca/

InstaHeads6y ago

Well, it seems neural networks started to appear for vocal and instrumental track isolation^^ recently I've discovered https://www.lalal.ai and it works quite well

philipov6y ago

manceraio6y ago

You could try spleeter on the cloud here https://voxremover.com

philipov6y ago

The output appears to cut off after 10 minutes. How do you make it operate on longer files, like in the 100 minute range?

jbverschoor6y ago

Deezer is pretty useless if all supported hardware require your phone to stream.

They should spend dev time on something that matters

peterhookgen6y ago

This is very cool, I have started using it for experimenting creating hardstyle dance remixes of popular songs

fit2rule6y ago

j / k navigate · click thread line to collapse