We have a large dataset of MIDI files, which may or may not have good audio data. We have good features for a million songs as part of the MSD. We'd like to match up the MIDI files to their corresponding entries in the MSD, and extract useful ground truth information from the MIDI files. The difficulty is the scale - we have tons of MIDI files with tons of possible matches in the MSD, so we need to do things efficiently.
We have about 150,000 MIDI files downloaded from the internet. Can we also find any large collections of MusicXML or other symbolic formats?
The MIDI files are sometimes named according to what song they are transcriptions of, but not always. MIDI files also have a way that they can be annotated so that you can specify an artist, song name, etc. We need good ways to extract all of the metadata we can find in each MIDI file so that we can get a head start on possible matches. EchoNest has some methods for extracting this information, see eg http://static.echonest.com/visualizations/years-active/find-the-artist.html See also Google Refine https://code.google.com/p/google-refine/ and freebase http://www.freebase.com/ . For each MIDI file, we may be able to get metadata from:
Once we have a list of candidates for some song in the MSD, we can actually try to determine whether it's a match using the EchoNest features. This idea is really similar to cover song detection. Cover song detection is often done by computing beat-synchronous chromagrams. The idea is that if you have two performances of the same song, they may differ in instrumentation or style but they should have the same underlying melodic/harmonic structure. The same thing is true for our MIDI files. So, given a MIDI file, we need to be able to generate beat-synchronous chromagrams. This will involve
If we have a MIDI file, and we know which song it is, and we have the audio (or features), how can we get useful information from the MIDI? The first thing we need to do is align the MIDI and audio file. There is code that exists for this, which may be quick enough and good enough but also may not. Once we have the MIDI file time aligned to the audio file, we can extract useful ground truth information. Packages for this exist, eg http://www.link.cs.cmu.edu/music-analysis/ but there may be more information we can get, or in other formats: