User Tools

Site Tools


Measuring the Evolution of Contemporary Western Popular Music

Authors Joan Serrà, Álvaro Corral, Marián Boguñá, Martín Haro, Josep Ll. Arcos
Publication Info Retrieved here, Supplementary Materials
Retrieval date 1/29/13

Discusses a model intended to measure the evolution of short-term variations in popular music on a large dataset.


Music is a human universal which exhibits repetition and contrast in order to convey an artistic meaning. Evolution of these patterns and variations over time has not been studied in great detail. Recently, the Million Song Dataset was released, which contains pitch, timbre, and loudness analysis at beat-resolution and year annotations for about a half a million songs spanning a wide variety of genres. This paper uses concepts from statistical physics and complex networks to model patterns and metrics characterizing the use of these features over time. The findings suggest that certain characteristics have stayed the same, but the selection of pitch sequences has shrunk, the overall loudness has grown, and the timbral palette has become more restricted.

Experimental Method

Pitch was provided as 12-dimensional vectors where each value represents the relative energy of each pitch, normalized to a value between 0 and 1. Timbre was provided as a 12-dimensional vector where each value represents the weight of 12 bivariate basis corresponding to “high level abstractions of the spectral surface, ordered by degree of importance”. The first dimension of the timbre corresponds to loudness. Music “codewords” were generated by discretizing the feature values in the dataset such that a small finite number of values were possible for each feature. Pitch vectors were quantized by thresholding at .5 to a one-bit value, and were circularly shifted to the same key. Timbre vectors were quantized using a ternary, equal frequency encoding using thresholds corresponding to the 33 and 66% quartiles of one million timbre vectors. Loudness values were shifted to the range of -60 to 0 deciebels and quantized in 300 steps. Songs were randomly sampled in sliding windows of 5 years, such that one million beats for each slide location were compiled.

For each codeword type, the codeword distribution was modeled. Pitch and timbre codeword distributions were modeled with discrete power law distributions, where the probability of some value <math>z</math> is given by <math>P(z) = \frac{1}{\zeta(\beta, c + z_{min})(c + z)^\beta}</math> where the possible values are <math>z = z_{min}, z_{min} + 1, \ldots</math>, <math>c, \beta</math> are parameters (with <math>c = 0</math> for the pitch codeword distribution), and <math>\zeta(\beta, q) = \sum_{n = 0}^{\infty} \frac{1}{(q + n)^\beta}</math>. Loudness values were described as a continuous truncated log-normal probability density function of <math>z = -x, z_{min} \le z \le z_{max}</math> given by <math>P(z) = \sqrt{\frac{2}{\pi\sigma^2}} \left[ \mathrm{erf}\left(\frac{\log (z_{max} - \mu)}{\sqrt{2}\sigma}\right) - \mathrm{erf}\left(\frac{\log (z_{min} - \mu)}{\sqrt{2}\sigma}\right) \right] \frac{1}{z}e^{-\frac{(\log(z) - \mu)^2}{2\sigma^2}}</math> with <math>\mathrm{erf}(y) = \frac{2}{\sqrt{\pi}} \int_0^y e^{-u^2}du</math>.

To fit the codeword distributions to the models described above, the fitting range is first fixed and parameters are estimated using maximum likelihood estimation (by maximizing the log-likelihood of the data given the model at each parameter setting). The discrepancy between the fit and the data is determined using the Kolmogorov-Smirnov distance between synthetic datasets generated using the model and the model/data. The fitting range is determined by performing the fitting procedure over many ranges and choosing the one which contains the largest amount of data with an acceptable p-value.

For each codeword type, complex weighted directed networks were constructed with a node for each codeword with directed links according to beat-by-beat codeword transitions, with weight according to the frequency of occurrence for the transition. Because (for pitch networks) the weights were roughly symmetric and link weight correlations were close to 1, an undirected network was used instead. Network topology was studied by the following metrics:

  • The strength of each node <math>s_i = \sum_{j = 1}^{N} w_{ij}</math> (the total weight associated with the connections to neighbors)
  • The degree distribution <math>P(k) \propto k^{\gamma}</math> (probability that a randomly chosen node has <math>k</math> neighbors) characterized by its average or median
  • The assortativity coefficient normalized with respect to a randomized network <math>\Gamma = \frac{\langle kk^\prime\rangle}{\langle kk^\prime \rangle_{rand}}</math> with <math>\langle kk^\prime \rangle = \sum_{k, k^\prime}kk^\prime P(k, k^\prime)</math> where <math>P(k, k^\prime)</math> is the probability that a randomly chosen link connects two nodes of degrees <math>k</math> and <math>k^\prime</math>. The randomized network is constructed by swapping pairs of links at random (without multiple links or self-connections). <math>\Gamma > 1</math> indicates a tendency towards connecting nodes with similar degrees; <math>\Gamma < 1</math> indicates high-degree nodes connecting to low-degree ones.
  • The clustering coefficient <math>c</math>, which averages the local clustering coefficients <math>c_i = \frac{2T_i}{k_i(k_i-1)}</math> of all nodes with degree above 1 where <math>T_i</math> is the number of triangles attached to node <math>i</math>
  • The average shortest path length <math>l = \frac{2}{N_{cc}(N_{cc} - 1)}\sum_{i, j = 1}^{N_{cc}}l_{ij}</math> where <math>l_{ij}</math> is the shortest path between nodes <math>i</math> and <math>j</math> and <math>N_{cc}</math> is the number of nodes in the largest connected component of the network.


The presence of different pitch codewords roughly follows a power law of the form <math>z \propto r^{-\alpha}</math> where <math>z</math> is the frequency count of a codeword, <math>r</math> denotes the rank, or a distribution of <math>P(z) \propto (c + z)^{-\beta}, \beta = 1 + 1/\alpha, c \mathrm{constant}</math> such that popular codewords are much more popular than rare ones. This does not change much across decade. The transition networks are sparse, “the number of links connecting codewords is of the same order of magnitude as the number of codewords”, such that only a limited number of transitions between codewords is possible. The degree (number of links to other codewords) has the same median across all years. The average shortest path length increases a small amount and the clustering coefficient decreases a significant amount (the “small-worldness” of the networks decreases with years), and the networks are less and less assortative than random over time (well-connected nodes are less likely to be connected among them), showing a progressive restriction of pitch transitions. The complementary cumulative distribution function of node strength <math>P_C(s) = \sum_{s^\prime = s} P(s^\prime)</math> can be fitted with a shifted power law <math>P_c(s) = (1 + s/c)^{1-\beta}</math> with <math>\beta = 2.1</math> and the strength of a node and its degree are super-linearly correlated so a disparity filter was used to “reduce the number of links and keep only the backbone of the system”. The null model for the disparity filter assumes that the probability that in a node of degree <math>k</math> one link has a fraction <math>u</math> of the node's strengh is <math>p(u) = (k - 1)(1 - u)^{k-2}</math> and a link is preserved if the probability of its weight is smaller than some value <math>\alpha</math>. This filter does not affect the strength distribution but does affect clustering or assortativity so both the original network and the filtered network without the 10 most connected nodes were analyzed, confirming the clustered and disassortative character of the original network.

The distribution of timbre codeword frequencies also follows a power law, but <math>\beta</math> peaks around 1965 and decreases over time which implies less timbral variety (frequent and infrequent codewords become more and less frequent). For the transition network, similar median degrees and degree distributions were observed for all years, and the networks were more assortative than random, with average shortest path length and clustering coefficients similar to those for random networks, suggesting that there may be less “meaning” in the timbre transitions. The average degree was 12 and most network weights were small.

The distribution of loudness values were well-fit by a reversed log-normal function, with the median growing linearly by about 0.13 dB per year, but the absolute difference between first and third quartiles remaining mostly constant, indicating that songs are getting louder but the dynamic variability has been conserved. In loudness transition networks, the average shortest path length and clustering coefficient ranges are much above the values for a random network, indicating that no extreme loudness transitions occur. There is little variability across year. A disparity filter was also applied to the loudness network.

Based on these findings, the authors suggest that what is perceived as “new” in modern music is rooted around the variety of pitch transitions, the “size” of the timbre pallete, and the overall loudness.

measuring_the_evolution_of_contemporary_western_popular_music.txt · Last modified: 2015/12/17 14:59 (external edit)