Presents a generative, example-driven spectral model which recovers frequency components in bandlimited signals.
Many systems rely on limiting the bandwidth of audio signals (for example, telephones filter frequencies outside of 300-3500 Hz), which is generally perceived as low quality. Bandwidth expansion attempts to recover some of the frequency content, which has been lost completely and cannot be inferred directly. Applying a simple memoryless nonlinearity can have some desirable effects in the frequency domain because the bandwidth will be extended by a factor corresponding to the highest power of the nonlinearity's taylor series. Example-driven techniques use eg codebooks, HMMs, GMMs to learn statistical dependencies of observed frequencies in bandlimited and non-bandlimited signals. Most previous methods are meant for speech and thus rely on the fact that in any given frame, the “true” frequencies will be harmonics of the single present frequency with some spectral envelope (formant) applied, and as a result cannot deal with more complex signals like music.
LCA is a weighted mixture of multiple marginal distributions which describe the allocation of frequencies across time. Expectation-maximization is used to estimate the unknown probability distributions of the magnitude spectrum and spectral envelopes. LCA can be seen as a probabilistic specialization of SVD where the inputs are 2D histograms, and is numerically equivalent to NMF. In musical content, the frequency marginals can be a set of magnitude spectra which characterize the harmonic series in the signal.
LCA can be used for bandwidth expansion by first extracting a set of frequency marginals from a signal which is spectrally close to the desired result, and then using these marginals to estimate time marginals and priors for frequencies where the bandlimited signal is significant. The estimated time marginals, prior, and “high-quality” frequency marginals can then be used to construct an estimation for a higher quality version of the bandlimited signal. Choosing a high-quality reference signal which represents the desires frequency range is important. Normally around 300 frequency marginal states are extracted. Assuming that the desired high-quality estimation is composed of similar frequency marginals, it follows that the bandlimited signal is approximately composed of similar frequency marginals corresponding only the frequencies present. We can calculate time marginals and priors using the same EM algorithm, keeping the frequency marginals set to those extracted from the high-quality reference signal, and only calculating for the frequencies present in the bandlimited signal. Then, the magnitude spectrum of high-quality estimation can be calculated using the estimated priors and time marginals and the high-quality reference frequency marginals. This spectrum can be transformed into the time domain by modulating with an estimated phase spectrum and performing an inverse STFT. The main strength of this approach is its ability to handle overlapping notes; other models cannot use multiple dictionary elements to reconstruct a spectral frame.