undefined | Better HN

0 pointsEli_P7y ago0 comments

That's interesting, are you going to make something like LabelImg[1]? I've been looking for something like that for audio, yet I'm not sure about treating audio as images. I've heard of this trick, but NN for audio better do work with RNN, GRU[2], maybe LSTM; and images are processed with CNN.

[1] https://github.com/tzutalin/labelImg [2] https://en.wikipedia.org/wiki/Gated_recurrent_unit

0 comments

1 comments · 1 top-level

pizza7y ago

I was gonna do something involving about 3 different neural nets:

a source separator: taking one audio stream as input and producing a set of audio streams as output.

a segmentation regression neural net: takes an audio stream as input and returns start and stop timestamps of individual samples as output, or alternativey, just trimmed copies of the audio stream

sample classifier: takes an audio stream and then returns “kick drum”, “snare drum”, “voice”, “guitar”, etc

then the pipeline would be like

source separator => segmenter => sample classifier

Hopefully with this I would be able to decompose music into constituent parts, useful for remixing and other kinds of musique concrete

I expect that the results with a deep pretrained generic image model + some tweaking with more niche training examples will be satisfactory, but if not it would be a good excuse to experiment with more traditionally seuqnece-oriented network architectures

j / k navigate · click thread line to collapse