Presents a technique for extracting high-level features from music which can describe the sections of a song.
Ideally, a segmenter should find verses/choruses/sections. Looking at regular peaks in the weighted distance between subsequent feature vectors is a simple approach. The autocorrelation of this weighted distance signal can reveal short time divisions (beats) to longer ones (phrases, verses). The sections can either be found by grouping shorter segments or dividing the song into a string of segments. This approach creates a peak list and uses a guessed peak period to find possible segment-defining peaks, then compares different lists of segments based on some heuristic criteria.
Occasionally the segmentation points are all that are needed, however they can provide more semantic insight. The first and last sections can be evaluated to see whether they appear to be fade-in/out sections or whether they are distinct from other sections (intro/outro); the “typical” segment can be found based on the per-segment feature average, which can be used to find “solo” segments. If solo/non-solo are found, the spectral signature of the lead instrument can also be surmised. All of these features can be normalized/averaged and used as high-level features for musical similarity, etc. Based on a thresholded confidence metric, the segmenter only fails 3% of the time on a test set of 15,000 songs. The failure of the segmenter can also be indicated by too many segments or extreme solo ratio features.
To test the importance of these new features, principal component analysis was performed on these features and a set of features typically used in MIR. Adding in the segmentation features results in more components with higher weights. The segmentation features contribute heavily to the second and above features. Using PART decision lists can create rules that correspond to genre. Using a similarity metric, the segmentation features can reliably create content-derived playlists. Improvement to the segmenter may be possible by using the number and length of segments and/or the distance peak-matching tolerance and percentage of peaks accounted for to better calculate the confidence score.