I'd expect Transformer to produce slightly more structured musical gibberish.
Ages ago I was a sequencer geek (Impulse Tracker!) while also noodling around with guitar, and I noticed something strange: I made music I liked a lot more when I composed on guitar and transposed onto the sequencer afterwards. After a lot of experimentation, I realized that the constraints on what my hands could do on guitar were (of course) having a huge impact on what I tried to do when composing -- and struggling with the constraint was helping me make music I liked more.
I like a vision for practical machine learning where we spend less time on plumbing and more time thinking about the kinds of constraints (e.g. through input encoding) that enable "creativity" on the part of the machine.
I also think there's room for other creative encodings for music - possibly expanding these notewise/chordwise ideas, or possibly going in a totally new direction. It's fascinating to me how much the generations are affected by the encoding.
Of course, I had to "play" the guitar myself by writing song sequences in those terms... it would be terrific to see what an AI could do with a notation scheme representing, say, a 20 string guitar or a 30 foot long flute.
"The more constraints one imposes, the more one frees one's self. And the arbitrariness of the constraint serves only to obtain precision of execution."
https://medium.com/capital-one-tech/why-you-dont-necessarily...
IMO this is eventually going to replace a lot of tasks. This for example, can dynamically generate elevator music (or music in an office). The system we built can generate synthetic data for testing and sharing samples of datasets. Eventually, we'll have entirely synthetically generated videos, advertisements, and more.
In 50 years, entire movies may be generated.
I hope you'll agree that you gotta find a better, more sympathetic example if you want to sell your generative algo's...
Background music is actually quite difficult, commercially. Someone needs to write and arrange it, and they need to be paid - either royalties each time it is played which is why a lot of companies don’t use “known” music for telephone hold and so on - it’s too expensive. If it’s not on a royalty basis then the writer needs to be bought out - which can be expensive.
So having algorithmically generated music is actually really interesting because there is potentially no author to be paid. This is actually an emerging area of music copyright law. If an algorithm writes music who owns the copyright to that music? The computer? Probably not, not a legal person. The people who wrote the algorithms? Possibly - but did they actually create the music? Or does no one own it - meaning anyone can use it without payment? If a label commissions an algorithm to write hits who owns the music publishing?
Drunk walk around a key, with randomised reset locations whenever the walk went out of bounds. Very good for fake oriental music, acceptable for action/scifi, terrible for theme development or classical style.
Nothing special, except that I totally failed to know anything about any of the previous efforts until years later, so it was all wheel-reinvention.
And then Apple deprecated Java, so it became obsolete.
If there is some pattern in your data, and your algorithm, rather than replicating something similar to the pattern, just outputs the most likley value at any point in time, then it is never going to work as you hope. Rests are a symptom of this, and fixing them doesn't fix the underlying issue.
There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.
The problem is GANs on sequence data still stink compared to max-likelihood: they train far more slowly, more unstably, and still don't generate decent sequences compared to a char-rnn with a bit of temperature tuning & beam search. They should be better for precisely the reason you say, but they aren't.
Edit: apparently someone has already implemented music generation with the Transformer. Samples: https://storage.googleapis.com/music-transformer/index.html
Why not have each time step contain all pitches that should sound during that time step (so starting a new quarter note and continuing a half note would both appear in the same time step)? Then at the end of generating the music, perform some post-processing to get the note lengths. Would the approach in the interview having any significant advantages to this approach? (I suppose you do lose the ability to rearticulate a pitch with my idea)
Do you have any examples of jazz compositions by your software? Would be very interested in hearing that.
Reading the steps feels like 9 months to a year before you got to credible music.
What kept you going in the belief this would work. I can think of 20 reasons why this shouldn't work - hence its "surprising" that it does. Its quite easily something you could have worked on for 5 years with no results.
reading your background - it also sounds like your time would be tightly constrained hence figuring out where to deploy it - you need to have some conviction you'll have success
Also, I noticed your data format has a flag for instrument type. Have you considered generating for voice? Obviously a very different beast but it seems the same principals could apply. It would be important to restrict the music to a model of what a human is capable of to make it singable. Adding physical constraints to the piano generated music might also be interesting. Fingers are so long and there are usually only ten.
Then, could you train a neural network (or a genetic algorithm, or whatever) to compose music that is assigned a high quality score by such a composition quality evaluator?
I used genetic algorithms to generate 4 measure melodies, using a long short-term memory (LSTM) neural network to determine the fitness of melodies. I trained the LSTM on snippets of music by J.S. Bach. It was able to distinguish between random noise notes and actual music quite well, and to a somewhat lesser degree between Bach and other composers.
The melodies it produced were...mixed in quality. I really liked some of them, but quite often it would get stuck at some local maxima of the fitness and couldn't mutate its way to something better.
[0] https://github.com/ThomasMatlak/is-software/tree/master/gene...
I'm really curious- any early results to share on that? Attention really does make a big difference on a lot of things (including work I've done so I know first hand). It should improve the coherence of the entire music piece in theory at least, right?
Some generated samples: https://storage.googleapis.com/music-transformer/index.html