# http://colinraffel.com/wiki/

### Site Tools

alignment_of_monophonic_and_polyphonic_music_to_a_score

# Alignment of Monophonic and Polyphonic Music to a Score

 Authors Nicola Orio, Diemo Schwarz Publication Info Retrieved here Retrieval date 6/6/13

Proposes a method for aligning scores to an audio recording using dynamic time warping to match spectral peaks.

## Background

Score alignment attempts to associate musical events in a score to points in time corresponding to where they appear in a performance of the score. A time-aligned score can be beneficial as ground truth symbolic information. Related sequence-to-signal alignment problems arise in speech recognition and biology domains. HMMs can also be used and are extensible to on-line score alignment (automatic accompaniment).

## Method

This method involves aligning sequences of features. Features for the score are synthesized using a harmonic model of each note, where rectangular windows with a bandwidth of a half tone are placed at the frequencies corresponding to the $h = 8$ harmonics of each note. This synthesized mask $S$ is multiplied by the Fourier magnitude spectrum $P^2$ of each frame and summed over frequency and divided by the spectral energy to calculate a distance (PSD), giving $1 - \frac{\sum_i S_i P_i^2}{\sum_i P_i^2}$. Better results were found when using the half wave rectified first order difference of PSD values. For silent frames, an alternate distance is used which is just the energy of the frame minus some threshold.

Dynamic time warping finds the path which minimizes local distances in a pair of sequences, with the constraint that the first and last sequence entries must be aligned and that the sequence must increase monotonically. The path at each point is chosen as the path which minimizes the sequence distances for all paths leading up to that point plus a cost for the distance from each path to the current point. Different distances are used for attacks, silences, and steady-state frames. Memory requirements can be reduced by only storing the “shortcut paths”, which are reduced to the first and last frames for each note.

## Experiment

A variety of performance styles were synthesized at various transpositions and aligned to their corresponding scores.