|Authors||Thierry Bertin-Mahieux, Ron J. Weiss and Daniel P. W. Ellis|
|Publication info||Retrieved here|
Attempts to summarize and classify genre/artist based on chromagrams of bars of representative pieces of music, using a large database. Another summary is here.
Seeks to determine a common structure to a large class of similar music, using the harmonic content (beat-synchronous chromagrams). Codebooks are generated from a large set of examples using short beat-chroma patches. The entries of the codebook could be of interest.
Per-“segment” chromagrams are generated from the Echo Nest API, and segments are averaged over each beat and normalized; there's some error here, but not a lot. The Echo Nest bar segmentation was used to create beat-chroma “patches”, 4 or 8 beats in length (anything with a non 4 beat signature was resampled). The patches are normalized with respect to transposition by rotating the matrix so that the first row contains the most energy. This results in patches invariant to key and meter. An online clustering algorithm (similar to K-means) was used which updates the clusters per-data point.
Used 43300 user-uploaded songs as training, and uspop2002 dataset as testing. Used a learning rate of .01 for 200 iterations, and used Euclidian distance. Encoding performance improves with the amount of training data; distortion improvements plateau at 1000 samples per codeword (for their 100 entry codebook), encoding performance improves with code book size; larger patterns require larger codebooks.
Common codewords include sustained single note, perfect fifth, perfect fourth, etc., (common) as well as transitions from one chord to another and “noisy” patterns (less common). Most patterns have low variance across time. Using longer patches, bigger patterns were observed (1-2-1-2 chord patterns, etc). The actual patterns which matched a given center were not always intuitive but generally were close with randomly (non-systematic) distributed error.
Bar alignment is possible by testing the distortion for offset segments against the song; this gave 62% accuracy in identifying the downbeat. Artist classification was tested on the artist20 dataset with relatively poor results.