Generating classical music with LSTM neural networks (opens in new tab)

(blog.floydhub.com)

152 pointsnarenst7y ago50 comments

50 comments

Always appreciate new attempts at music gen. As with every attempt thus far, even the hand picked selections sound like random nonsense locked to diatonics within a particular key, and no real harmony or counterpoint to speak of (and that's the "good" output, they never let you hear just any old output, it's always 'listen to this handful I selected, the rest may be total garbage').

raptorraver7y ago

What if we would train computers to compose same way as we teach composition students: renaissance counterpoint, fugues of Bach and harmonic structuring of classical era (sonata form)?

TaupeRanger7y ago

Unfortunately, as most people in computational creativity will tell you, teaching/learning "rules" is only a tiny sliver of the problem. But even getting a computer to "understand" those things in a way that would allow it to apply them to the act of composing is vastly beyond our current understanding.

stephencanon7y ago

By far the best output I've heard is still David Cope's stuff, which dates back to at least the 1990s. No one really seems to have improved on it significantly.

TaupeRanger7y ago

Which, even worse, was not only hand selected but also heavily influenced by Cope himself, as he selected snippets he enjoyed from the output. So it's not really an apples to apples comparison.

1 more reply

TheOtherHobbes7y ago

Agreed. This is barely structured musical gibberish.

I'd expect Transformer to produce slightly more structured musical gibberish.

EADGBE7y ago

As a musician; I'm less worried about a computer replacing me than as a software developer.

evrydayhustling7y ago

I found the part about notewise vs chordwise encodings very interesting!

Ages ago I was a sequencer geek (Impulse Tracker!) while also noodling around with guitar, and I noticed something strange: I made music I liked a lot more when I composed on guitar and transposed onto the sequencer afterwards. After a lot of experimentation, I realized that the constraints on what my hands could do on guitar were (of course) having a huge impact on what I tried to do when composing -- and struggling with the constraint was helping me make music I liked more.

I like a vision for practical machine learning where we spend less time on plumbing and more time thinking about the kinds of constraints (e.g. through input encoding) that enable "creativity" on the part of the machine.

mcleavey7y ago

That's so interesting - you're totally right that setting constraints often leads to really creative ideas. It reminds me of the "crab canons" by Mozart and Bach: https://en.wikipedia.org/wiki/Crab_canon .

I also think there's room for other creative encodings for music - possibly expanding these notewise/chordwise ideas, or possibly going in a totally new direction. It's fascinating to me how much the generations are affected by the encoding.

evrydayhustling7y ago

Another fun direction is to generalize the kinds of constraints we put on our own instruments! I had a chance to play with that in a graduate class by implementing an API for midi generation where you set chord fingerings and strum patterns independently for a guitar of [N] strings.

Of course, I had to "play" the guitar myself by writing song sequences in those terms... it would be terrific to see what an AI could do with a notation scheme representing, say, a 20 string guitar or a 30 foot long flute.

williamdclt7y ago

Baudelaire said something like that (about the sonnet): "Parce que la forme est contraignante, l'idée jaillit plus intense" (poor translation: "because the form is constraining, the idea comes out more intense")

nixpulvis7y ago

I forget where I read it, but that's the design behind the OP-1 by teenage engineering. https://www.teenageengineering.com/products/op-1

emptybits7y ago

I reckon Igor Stravinsky would agree:

"The more constraints one imposes, the more one frees one's self. And the arbitrariness of the constraint serves only to obtain precision of execution."

jacquesm7y ago

The same goes for programming. The more constrained the environment, the more the code looks like a work of art.

citilife7y ago

We're actually using a similar technique (quite a bit more complex) to generate synthetic data for applications:

https://medium.com/capital-one-tech/why-you-dont-necessarily...

IMO this is eventually going to replace a lot of tasks. This for example, can dynamically generate elevator music (or music in an office). The system we built can generate synthetic data for testing and sharing samples of datasets. Eventually, we'll have entirely synthetically generated videos, advertisements, and more.

In 50 years, entire movies may be generated.

jmh5307y ago

My english teacher in high school said that some guy from Apple came to talk to them and said that soon AI would be able to write stories. That was 15 years ago and as far as I can tell, they cant use bots to write anything like an original story that anyone would want to actually read. Good luck making "entire movies."

huhlig7y ago

You mean like this: https://arstechnica.com/gaming/2016/06/an-ai-wrote-this-movi...

1 more reply

isoprophlex7y ago

Is anyone currently affected by a lack of elevator music, be it due to financial reasons or any other reason, and does your approach solve this?

I hope you'll agree that you gotta find a better, more sympathetic example if you want to sell your generative algo's...

saaaaaam7y ago

Yes. Mood Media is a company that bought Muzak, Inc - the original elevator music company (and the reason we sometimes talk about disposable music like this as “muzak”). They are a substantial business now owned by private equity. They acquired Muzak for soemthing like $300m a few years back.

Background music is actually quite difficult, commercially. Someone needs to write and arrange it, and they need to be paid - either royalties each time it is played which is why a lot of companies don’t use “known” music for telephone hold and so on - it’s too expensive. If it’s not on a royalty basis then the writer needs to be bought out - which can be expensive.

So having algorithmically generated music is actually really interesting because there is potentially no author to be paid. This is actually an emerging area of music copyright law. If an algorithm writes music who owns the copyright to that music? The computer? Probably not, not a legal person. The people who wrote the algorithms? Possibly - but did they actually create the music? Or does no one own it - meaning anyone can use it without payment? If a label commissions an algorithm to write hits who owns the music publishing?

1 more reply

hamaluik7y ago

One such example would be for people making videos on sites like YouTube, where you want some sort of background music to keep the video alive but where you don't want to license something, use the same music as everyone else, or spent a lot of time digging through the internet to find something that ticks all the boxes.

ben_w7y ago

Elevator music was, in retrospect, pretty much what I a decade ago in a failed effort to be a Mac shareware developer. Mostly games, and their background music was procedurally generated, no real beginning or ending.

Drunk walk around a key, with randomised reset locations whenever the walk went out of bounds. Very good for fake oriental music, acceptable for action/scifi, terrible for theme development or classical style.

Nothing special, except that I totally failed to know anything about any of the previous efforts until years later, so it was all wheel-reinvention.

And then Apple deprecated Java, so it became obsolete.

londons_explore7y ago

The issue that 'rests are so common, we need to remove them or the algorithm would just predict rests all the time' shows the flaw with this approach.

If there is some pattern in your data, and your algorithm, rather than replicating something similar to the pattern, just outputs the most likley value at any point in time, then it is never going to work as you hope. Rests are a symptom of this, and fixing them doesn't fix the underlying issue.

There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.

gwern7y ago

> There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.

The problem is GANs on sequence data still stink compared to max-likelihood: they train far more slowly, more unstably, and still don't generate decent sequences compared to a char-rnn with a bit of temperature tuning & beam search. They should be better for precisely the reason you say, but they aren't.

visarga7y ago

I am struck at the quality of music neural nets can generate today. Just a few years ago it was much worse - the notes would make sense for 2-3 seconds and then they would just drift into another direction. And using the Transformer for music is an intriguing idea.

Edit: apparently someone has already implemented music generation with the Transformer. Samples: https://storage.googleapis.com/music-transformer/index.html

big_t7y ago

In the answer to "Wait, what's a rest?", I'm intrigued by the definition of "...any time step where you don’t play any new notes." (emphasis mine)

Why not have each time step contain all pitches that should sound during that time step (so starting a new quarter note and continuing a half note would both appear in the same time step)? Then at the end of generating the music, perform some post-processing to get the note lengths. Would the approach in the interview having any significant advantages to this approach? (I suppose you do lose the ability to rearticulate a pitch with my idea)

microtherion7y ago

I got 3 out of 4 correct. In the first two questions, the AI seemed easy to identify because of alien rhythmic patterns, not really because of melodic content. In the 3rd, the AI was identifiable because the piece, while pleasant, seemed to lack a plausible development of the idea (but this is something that easily could be ascribed to a second rate human composer). The one I got wrong, the AI composition was pretty good, and the human one had exactly the alien rhythmic patterns that to me were a giveaway for an AI composition. Weird composer or bad performance?

Do you have any examples of jazz compositions by your software? Would be very interested in hearing that.

whatrocks7y ago

There's a short snippet of a jazz composition from Clara near the top of the post.

microtherion7y ago

Ah, thanks! Not entirely natural, but could marginally be passed off as "Thelonious Monk, having drunk one Espresso too many".

nilanp7y ago

I'm really curious how much effort there was in building up the data set - before, training the model before you got to "music"

Reading the steps feels like 9 months to a year before you got to credible music.

What kept you going in the belief this would work. I can think of 20 reasons why this shouldn't work - hence its "surprising" that it does. Its quite easily something you could have worked on for 5 years with no results.

reading your background - it also sounds like your time would be tightly constrained hence figuring out where to deploy it - you need to have some conviction you'll have success

jcoffland7y ago

Awesome work Christine! I've only ever heard you play classical music in concert. Any plans to perform bits of your AI generated music live. Perhaps with Ensemble SF?

Also, I noticed your data format has a flag for instrument type. Have you considered generating for voice? Obviously a very different beast but it seems the same principals could apply. It would be important to restrict the music to a model of what a human is capable of to make it singable. Adding physical constraints to the piano generated music might also be interesting. Fingers are so long and there are usually only ten.

skissane7y ago

Has anyone done work on automated evaluation of the quality of a musical composition? Possibly by training a neural network, or maybe even just by designing some heuristic rules which try to capture what elements make music pleasing to humans?

Then, could you train a neural network (or a genetic algorithm, or whatever) to compose music that is assigned a high quality score by such a composition quality evaluator?

big_t7y ago

I actually just recently took a shot at something very similar to this for my undergrad thesis! [0]

I used genetic algorithms to generate 4 measure melodies, using a long short-term memory (LSTM) neural network to determine the fitness of melodies. I trained the LSTM on snippets of music by J.S. Bach. It was able to distinguish between random noise notes and actual music quite well, and to a somewhat lesser degree between Bach and other composers.

The melodies it produced were...mixed in quality. I really liked some of them, but quite often it would get stuck at some local maxima of the fitness and couldn't mutate its way to something better.

[0] https://github.com/ThomasMatlak/is-software/tree/master/gene...

citnaj7y ago

>"More recently, there is a shift towards using a Transformer architecture, and right now I’m experimenting with that as well."

I'm really curious- any early results to share on that? Attention really does make a big difference on a lot of things (including work I've done so I know first hand). It should improve the coherence of the entire music piece in theory at least, right?

kasrahbar7y ago

Check out Music Transformer that was recently published https://arxiv.org/abs/1809.04281

Some generated samples: https://storage.googleapis.com/music-transformer/index.html

bcheung7y ago

The accompaniment examples are cool. That would be a very nice tool to have. Just play a melody and it auto-generates an accompaniment.

mcleavey7y ago

Transformer is working really well- I'm very excited. I'll probably be sharing results soon. Yes, the attention makes a huge difference & the pieces are both more creative and more coherent.

gwern7y ago

Have you considered using 'learning from human preferences' as the loss function in addition to the Transformers? That was another OpenAI project, and it seems tailor-made for music generation: what is more 'I know it when I hear it' than music quality?

citnaj7y ago

That is exciting! I'll be watching on Twitter :)

minimaxir7y ago

textgenrnn (https://github.com/minimaxir/textgenrnn) uses a simple Attention Weighted Average at the end of the model for text generation, which in my testing allows the model to learn much better.

gleenn7y ago

Reminds me of this pretty cool music generation talk at StrangeLoop. I can't find his own site, but here's the SL page: https://www.thestrangeloop.com/2018/making-machines-that-mak...

scottlocklin7y ago

LZW does a creditable job as well.

https://arxiv.org/pdf/1107.0051.pdf

EADGBE7y ago

Better not have any parallel fifths in there or I swear to god my Theory professor will come out of his grave and berate the AI for it.

p1esk7y ago

Hi Christine, what are your thoughts on using reinforcement learning for music generation? Has anyone tried that at OpenAI?

robbiemitchell7y ago

Need some Chopin to help train the full 88 keys!

mcleavey7y ago

Haha, very true. It was Charlie's question in this interview that made me realize the 62 key limit was an old fix that I no longer needed, so now I'm trying out expanding my dataset and also expanding to the full 88 keys!

j / k navigate · click thread line to collapse

50 comments

TaupeRanger7y ago

raptorraver7y ago

What if we would train computers to compose same way as we teach composition students: renaissance counterpoint, fugues of Bach and harmonic structuring of classical era (sonata form)?

TaupeRanger7y ago

stephencanon7y ago

By far the best output I've heard is still David Cope's stuff, which dates back to at least the 1990s. No one really seems to have improved on it significantly.

TaupeRanger7y ago

Which, even worse, was not only hand selected but also heavily influenced by Cope himself, as he selected snippets he enjoyed from the output. So it's not really an apples to apples comparison.

1 more reply

TheOtherHobbes7y ago

Agreed. This is barely structured musical gibberish.

I'd expect Transformer to produce slightly more structured musical gibberish.

EADGBE7y ago

As a musician; I'm less worried about a computer replacing me than as a software developer.

evrydayhustling7y ago

I found the part about notewise vs chordwise encodings very interesting!

mcleavey7y ago

evrydayhustling7y ago

williamdclt7y ago

nixpulvis7y ago

I forget where I read it, but that's the design behind the OP-1 by teenage engineering. https://www.teenageengineering.com/products/op-1

emptybits7y ago

I reckon Igor Stravinsky would agree:

"The more constraints one imposes, the more one frees one's self. And the arbitrariness of the constraint serves only to obtain precision of execution."

jacquesm7y ago

The same goes for programming. The more constrained the environment, the more the code looks like a work of art.

citilife7y ago

We're actually using a similar technique (quite a bit more complex) to generate synthetic data for applications:

https://medium.com/capital-one-tech/why-you-dont-necessarily...

In 50 years, entire movies may be generated.

jmh5307y ago

huhlig7y ago

You mean like this: https://arstechnica.com/gaming/2016/06/an-ai-wrote-this-movi...

1 more reply

isoprophlex7y ago

Is anyone currently affected by a lack of elevator music, be it due to financial reasons or any other reason, and does your approach solve this?

I hope you'll agree that you gotta find a better, more sympathetic example if you want to sell your generative algo's...

saaaaaam7y ago

1 more reply

hamaluik7y ago

ben_w7y ago

Nothing special, except that I totally failed to know anything about any of the previous efforts until years later, so it was all wheel-reinvention.

And then Apple deprecated Java, so it became obsolete.

londons_explore7y ago

The issue that 'rests are so common, we need to remove them or the algorithm would just predict rests all the time' shows the flaw with this approach.

There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.

gwern7y ago

> There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.

visarga7y ago

Edit: apparently someone has already implemented music generation with the Transformer. Samples: https://storage.googleapis.com/music-transformer/index.html

big_t7y ago

In the answer to "Wait, what's a rest?", I'm intrigued by the definition of "...any time step where you don’t play any new notes." (emphasis mine)

microtherion7y ago

Do you have any examples of jazz compositions by your software? Would be very interested in hearing that.

whatrocks7y ago

There's a short snippet of a jazz composition from Clara near the top of the post.

microtherion7y ago

Ah, thanks! Not entirely natural, but could marginally be passed off as "Thelonious Monk, having drunk one Espresso too many".

nilanp7y ago

I'm really curious how much effort there was in building up the data set - before, training the model before you got to "music"

Reading the steps feels like 9 months to a year before you got to credible music.

reading your background - it also sounds like your time would be tightly constrained hence figuring out where to deploy it - you need to have some conviction you'll have success

jcoffland7y ago

Awesome work Christine! I've only ever heard you play classical music in concert. Any plans to perform bits of your AI generated music live. Perhaps with Ensemble SF?

skissane7y ago

Then, could you train a neural network (or a genetic algorithm, or whatever) to compose music that is assigned a high quality score by such a composition quality evaluator?

big_t7y ago

I actually just recently took a shot at something very similar to this for my undergrad thesis! [0]

The melodies it produced were...mixed in quality. I really liked some of them, but quite often it would get stuck at some local maxima of the fitness and couldn't mutate its way to something better.

[0] https://github.com/ThomasMatlak/is-software/tree/master/gene...

citnaj7y ago

>"More recently, there is a shift towards using a Transformer architecture, and right now I’m experimenting with that as well."

kasrahbar7y ago

Check out Music Transformer that was recently published https://arxiv.org/abs/1809.04281

Some generated samples: https://storage.googleapis.com/music-transformer/index.html

bcheung7y ago

The accompaniment examples are cool. That would be a very nice tool to have. Just play a melody and it auto-generates an accompaniment.

mcleavey7y ago

Transformer is working really well- I'm very excited. I'll probably be sharing results soon. Yes, the attention makes a huge difference & the pieces are both more creative and more coherent.

gwern7y ago

citnaj7y ago

That is exciting! I'll be watching on Twitter :)

minimaxir7y ago

textgenrnn (https://github.com/minimaxir/textgenrnn) uses a simple Attention Weighted Average at the end of the model for text generation, which in my testing allows the model to learn much better.

gleenn7y ago

Reminds me of this pretty cool music generation talk at StrangeLoop. I can't find his own site, but here's the SL page: https://www.thestrangeloop.com/2018/making-machines-that-mak...

scottlocklin7y ago

LZW does a creditable job as well.

https://arxiv.org/pdf/1107.0051.pdf

EADGBE7y ago

Better not have any parallel fifths in there or I swear to god my Theory professor will come out of his grave and berate the AI for it.

p1esk7y ago

Hi Christine, what are your thoughts on using reinforcement learning for music generation? Has anyone tried that at OpenAI?

robbiemitchell7y ago

Need some Chopin to help train the full 88 keys!

mcleavey7y ago

j / k navigate · click thread line to collapse