undefined | Better HN

0 pointsfnordpiglet2y ago0 comments

Generative models can certainly create midi, but no one has done it yet. Given the technique is making video, audio, images, and language, all you need to do is train and build a model with an appropriate architecture.

It’s easy to forget this is all pretty new stuff and it still costs a lot to make the base models. But the techniques are (more or less) well documented and implementable with open source tools.

0 comments

11 comments · 4 top-level

radarsat12y ago· 5 in thread

> Generative models can certainly create midi, but no one has done it yet.

Note sequence generation from statistical models has a long history, at least as long if not longer than text generation.

Have a look at section 2.1 of this survey paper [0] that cites a paper from 1957 as the first work that applies Markov models to music generation.

And, of course, plenty of follow-up work 6 decades later on GANs, LSTMs, and transformers.

[0]: https://www.researchgate.net/publication/345915209_A_Compreh...

fnordpigletOP2y ago

Yes, in fact I think at some point everyone has written their own Markov generators or at least run dissociative press. But we’ve really only seen meaningfully high quality output over the last few years.

radarsat12y ago

I think it depends on how you define that. People were quite happy with HMM-based MIDI generators that could generate Beethoven- or Mozart-like sequences 10, maybe even 15 or 20 years ago. But of course other people pointed out the problems of it being boring eventually. Then LSTMs improved long-term dependencies and people were impressed by the improved quality of generating whole musical pieces. But still others thought it was not good enough. Then the goalposts moved again with transformers and neural vocoders and now we want top-40 direct audio generation. And these latest systems can kind of sort of do it! But still there are people who demand better. And so on, things will continue to improve.

Progress only moves as fast as expectations, and expectations move with technology. Music is not special in this respect. So you could say at any given time in the past that some people "see meaningfully high quality" and others are disappointed. You see exactly both these sides of the spectrum even now with text-to-image and text-to-audio technology.

bch2y ago

> cites a paper from 1957

By Fred Brooks no less…

https://en.m.wikipedia.org/wiki/Fred_Brooks

fassssst2y ago

Do you know if anyone has tried training a text-to-music or text-to-midi model where the training data includes things like emotion labels for each note interval or chord progression?

Jeff_Brown2y ago

That sounds expensive and inefficient. Peoples' interpretations of music (and abstract art more generally) can be shockingly different; I suspect the model would not get a clear signal from the result.

But that makes me wonder to what extent labeling can be programmed -- extracting chord changes, dynamics changes, tempo, gross timbral characteristics, etc.

1 more reply

TheActualWalko2y ago· 2 in thread

We’ve done it! wavtool.com

fnordpigletOP2y ago

That’s really neat. How long have you been working on this?

TheActualWalko2y ago

Thanks! It grew out of an old side project. Been full time on it since December.

MrCheeze2y ago

It has been done - first by OpenAI (MuseNet, which is no longer available) and later by Stanford (Anticipatory Music Transformer): https://nitter.net/jwthickstun/status/1669726326956371971

jskherman2y ago

I believe Spotify's Basic Pitch[0] is already some work towards building something like this.

[0]: https://basicpitch.spotify.com/about

j / k navigate · click thread line to collapse

0 comments

11 comments · 4 top-level

radarsat12y ago· 5 in thread

> Generative models can certainly create midi, but no one has done it yet.

Note sequence generation from statistical models has a long history, at least as long if not longer than text generation.

Have a look at section 2.1 of this survey paper [0] that cites a paper from 1957 as the first work that applies Markov models to music generation.

And, of course, plenty of follow-up work 6 decades later on GANs, LSTMs, and transformers.

[0]: https://www.researchgate.net/publication/345915209_A_Compreh...

fnordpigletOP2y ago

radarsat12y ago

bch2y ago

> cites a paper from 1957

By Fred Brooks no less…

https://en.m.wikipedia.org/wiki/Fred_Brooks

fassssst2y ago

Do you know if anyone has tried training a text-to-music or text-to-midi model where the training data includes things like emotion labels for each note interval or chord progression?

Jeff_Brown2y ago

But that makes me wonder to what extent labeling can be programmed -- extracting chord changes, dynamics changes, tempo, gross timbral characteristics, etc.

1 more reply

TheActualWalko2y ago· 2 in thread

We’ve done it! wavtool.com

fnordpigletOP2y ago

That’s really neat. How long have you been working on this?

TheActualWalko2y ago

Thanks! It grew out of an old side project. Been full time on it since December.

MrCheeze2y ago

It has been done - first by OpenAI (MuseNet, which is no longer available) and later by Stanford (Anticipatory Music Transformer): https://nitter.net/jwthickstun/status/1669726326956371971

jskherman2y ago

I believe Spotify's Basic Pitch[0] is already some work towards building something like this.

[0]: https://basicpitch.spotify.com/about

j / k navigate · click thread line to collapse