Variations Fragments Generated by a Recurrent Neural Network

Project Year: 2017

Variations on Fragments Generated by a Recurrent Neural Network
 
The Variations above follow the same general structure: the fragment is first stated alone, and it is then developed, augmented, shrunk, repeated, transposed into the rest of the piece. The RNN code is available on GitHub.
 
I. Goal: Endless Source of Musical Ideas
 
A recent BBC article reported AlphaGo’s defeat of Go prodigy Ke Jie and provided a thought provoking quote from DeepMind CEO Demis Hassabis: “it was interesting for us to see him using moves from AlphaGo's previous games.” AlphaGo became the teacher. A human learning from an algorithm which was first trained on how humans play the game is a fascinating concept, and a worthy subject to explore through art: could an AI algorithm give me an idea that I wasn’t able to think of before, and more importantly, could that idea then help me explore a world I otherwise wouldn’t have?
 
There’s a long history of composers developing folk melodies or building on other composers’ ideas. Messiaen incorporated Hindu rhythms, the deci-talas, into many of his pieces (Messiaen 271), he was also deeply inspired by non-human melodies: birdsong, and often created pieces which developed, varied and expanded on birds’ incredible natural melodies. I wanted to explore another type of musical idea source, a statistical model which had been trained on compositions by the masters.
 
Spending time in the musical idea generation phase is an important part of my process with or without AI’s help. I often start by sketching many short musical fragments and when I come across an idea I really like, I’ll develop it further. Seed, my string orchestra piece, was composed out of a four-note theme that was just one out of over a hundred ideas. Throughout the piece, the theme is transformed into the very material the piece’s evolving textures are made of.
 
I recently started studying Deep Learning, a subset of Machine Learning, through Udacity’s wonderful Deep Learning Foundations Nanodegree. Variations on Fragments Generated by a Recurrent Neural Network is my first application of Deep Learning in music. Here, I trained a Recurrent Neural Network (RNN) on piano music composed by Beethoven, Chopin, Haydn, Mozart, Prokofiev, Ravel, Scarlatti, Schubert, and Scriabin. I sampled/generated musical fragments from the trained model, picked the ones that I found captivating and developed them into short pieces.
 
This project integrates and builds on, work, code and ideas of many brilliant people, I’m so grateful to the open source community around AI, and I hope that my approach to art-with-AI described here contributes to the ongoing dialogue. For inspiring work done by others in this area please refer to the FlowMachines project, their DeepBach paper is fascinating, and also check out Google’s Magenta project.
 
II. The Context
 
Though this is my first application of Deep Learning, it isn’t my first application of AI in music. In 2017, I built a rule-based algorithm for music generation which incorporated the musical theory and building blocks at the core of my earlier works. The algorithm takes musical ideas inspired by Messiaen’s Treatise on Rhythm, Color and Ornithology, and encodes some of the “rules” in the generative algorithm.
 
In the AEN code, there are clear rules, I know how the algorithm is making each decision, I know where the command came from and how the MIDI signals were generated. Though the randomness is applied in a creative way, and though the ear doesn’t notice the rules in action, the rules are there, explicitly coded. The algorithm’s decisions can be traced in the code.
 
There isn’t a single explicit rule in the RNN model, the algorithm learns the underlying patterns through statistics, through looking at a lot data. This is important because, with the RNN model, the generated music's personality can be greatly affected by changing the type of music, the data, it looks at during the training phase. AEN is much more narrow in its musical style even if its output is more “complete.”
 
III. Approach: Character-Level Language RNN Model
 
The original Recurrent Neural Network (LSTM) implementation by Udacity’s instructor Mat Leonard, I trained can be found here. It provides an implementation of Andrej Karpathy’s Character-Level Language Model in TensorFlow, details of which are provided in Andrej’s phenomenal The Unreasonable Effectiveness of Recurrent Neural Networks post.
 
The data that is fed into the neural network is represented in **kern format (please make sure to check out this guide about how the music is represented in **kern). The format seemed promising as each character in the text file contains significant musical information. The use of **kern format, as well as some of my data processing code expanded upon the ideas, and code snippets provided in the following blog. The author, Joshua Bloom, does a wonderful job exploring the **kern format as well as pointing to a sizeable dataset of musical scores.
 
In the **kern text-based character-level musical representation, there is certainly some information loss when considering the fact that after each musical event there’s a new line character, and after each note, a whitespace character. Nonetheless, I was emboldened by Karpathy’s visualizations of what tasks and responsibilities different neurons in the RNN network take over (see the Visualizing the predictions and the “neuron” firings in the RNN section of the post as well as the paper itself). There should be certain neurons in the trained network which will fire when a new line character should be generated next.
 
My intuition was that if I create a large enough network, plenty of neurons should be free to focus on the task of melodic and even harmonic sequences, while others handle the more “mundane” task of generating whitespace or newline characters.
 
Training of the model lasted about 23 hours on a Floydhub GPU instance. I tried out many different hyper-parameters, and found it difficult to stop as even the poorly trained models (ones which ended up with relatively high loss) end up outputting some very interesting fragments. I originally trained on the works of just Mozart and Beethoven, but soon expanded the dataset to include many more pieces.
 
Please check out the code for the implementation details.
 
IV. Conclusion
 
As I was reading through the comment section of the The Unreasonable Effectiveness of Recurrent Neural Networks post, one of the commenters responded to another’s use of the Character-level language model for music generation in very negative terms. Though I’m not convinced that this argument applies to all use cases, I definitely look forward to learning more advanced methods for training statistical models.
 
It’s important to remember that the model doesn’t actually know it’s learning “music”, it’s learning patterns in a sequence of integer values. The **kern format attaches a lot of musical information to each character and as a result it even manages to learn harmonies. I think model 24, one I’ve provided in the source code, could be improved further.
 
That being said, music generated by this model is not close to the human-level compositions achieved by a refined model such as DeepBach. However, I feel that for the purposes laid out at the beginning, my use case is not only tolerant of the “imperfect” music that is generated by the model but the “mistakes” are actually beneficial.
 
I love the mistakes, the loss of focus, provided in the samples, it’s like someone sitting at a piano starting a beautiful idea, then completely abandoning it, churning out musical idea after musical idea, jumping from one to the next. For example, the thematic idea which inspired Variation on Fragment 34 didn’t appear until about halfway through the generated sample.
 
That’s exactly what I try to do in the musical idea generation phase, move from idea to idea. Though the output will be far from perfect as far as the “composing classical pieces using AI” perspective goes, I’ve found the RNN to be a great companion to my brain-storm stage by being a simple but effective instance of the Endless Source of Musical Ideas.
 
V. Resources