Best Practices — Artificial Musical Composition Design
The best way to produce produce a learning of musical composition is to architect a solution that contains the basic components of composition proficiency.
- Analysis of prior work
- Music theory
- Creative application of those two
- Transcription or MIDI representation
The learning is in the first and third items and the second is a model of musical scales, harmony, melody, and rhythm. For the learning, GAN (Generative Adversarial Network) is the best choice from the array of machine learning approaches that are well developed as of this writing.
Critique of Other Approaches
Although DeepMind Technologies Limited (London) has some well designed solutions to offer, their WaveNet raw audio generation software produces a nice voice synthesis, but the music generation leaves much to be desired, which is fine. Their goal was better speech synthesis.
Although generative models is the way to go, as mentioned above, DeepMind will not likely provide you with the source to their proprietary software. There is a non-proprietary implementation, but the software lacks a comprehensive model of either traditional or progressive music theory.
CNN and LSTM are for recognition, which is not at all the best choice for analyzing prior compositions when several compositions are available in MIDI or other digital transcription formats. Even files representing 19th century player piano rolls are adequate for training examples.
It is not the composer's responsibility to play instruments well or sing with grace and power. The composer determines a piece as a matrix of notes with the following features.
- Tone (frequency)
- Intensity (amplitude)
- Timbre (as from a specific instrument)
- Start time (relative)
- Modifications during the note
The start time and duration are expressed as a function of time signature.
A musical composition is a representation of this matrix. An artificial orchestra, band, vocalist, choir, or DJ can be constructed to provide the sound, but that is a separate art. There is no reason why an artificial composer can't provide a piece to human performers either.
This is an important distinction. An artificial composer does not produce sound. They produce the matrix of notes.
Music theory can be best represented in object oriented software, as a library, providing all musical options for notes, chords, melodies, and rhythms as abstractions.
This is where deep networks may prove effective, since the function of applying the patterns learned (via a GAN or some other generative sub-system) to the music theory and the production of a musical expression that will be perceived as having merit by listeners, buyers, and/or critics may be a complex function best developed in an artificial network.
The disparity (error) function in various capacities in this system's design is key.
The GAN must converge based on a metric expressing the fitness of the piece, and the deep network must converge on a selection of generated pieces based on the perception of merit mentioned above. Ultimately, the primary feedback will be from listening events, downloads, purchases, and reviews. How those aggregate is also key. If the goal is money, then purchases may preempt the others in aggregation of feedback. If the goal is fame, performances and reviews may matter most. If the goal is reach, then listening events in YouTube or downloads of an mp3 of a performance of the composition may be the primary indicator.
How the GAN and the deep network that evaluates GAN output and may also market the best results interact with one another presents an interesting problem, which is largely a problem in probability, balance, and connection topology.
There are many MIDI libraries and there are software packages that will produce notes on a staff in PDF or print form. The output of the composition is the easiest of the engineering tasks involved in this system's design.