recent_advances_in_real-time_musical_effects_synthesis_and_virtual_analog_models

Authors | Jyri Pakarinen, Vesa Välimäki, Federico Fontana, Victor Lazzarini, and Jonathan S. Abel |

Publication Info | Retrieved here |

Retrieval date | 4/6/11 |

A review paper of some recent advances in real-time digital audio effects and synthesis.

Digital audio effects and synthesis are widely used and often need to be able to process in real-time. This paper covers adaptive effects, vintage delay/reverberator simulation, tube amplifier simulation, VCF simulation, sound synthesis, and processing languages.

An effect is “adaptive” if the parameters are controlled by features of the input signal. One example is “self-modulating FM”, where the modulating signal frequency is set based on the pitch of the input signal, or the (low-pass filtered) input signal itself is used as the modulation signal. “Adaptive FM” is similar, but uses the (low-pass filtered) input signal as the carrier, with the input signal's pitch used to control the modulation frequency and does the modulation by varying the delay time of a fractional delay line, with width according to the modulation index. Other variations on FM include splitting FM sidebands into groups, or using assymetric spectra, or based on Bessel coefficients. FM can also be done by varying an all-pass filter coefficient with a (low-pass filtered) input signal - this produces a distortion effect. The structure of the all-pass filter changes the effect a great deal, so <math>y(n) = x(n-1) - a[x(n)-y(n-1)]</math> was found to produce optimal results. The “spectral delay filter” is similar; the signal is sent a chain of all-pass filters whose coefficients are modulated at the fundamental frequency of the input signal. A “brassifier” effect models nonlinearities in brass instruments by scaling the input signal and using it to control a fractional delay, which phase modulates the (low-pass filtered) input signal; this is not an uncommon technique for creating passive nonlinearities.

It's very useful to have a real-time effect which can emulate real-world reverb. A simple early method involved placing a speaker and a microphone in a reverberant room. This effect can be replicated by convolution with the room's impulse response, but (frequency domain-based) convolution with long impulse responses causes latency and is computationally intensive. Techniques presented to limit latency at the cost of more CPU usage generally involve splitting the impulse response into sections (either with recursively increasing or equal length) and performing shorter frequency transforms. Reverb can also be modeled as early reflections, followed by exponentially decaying gaussian noise (late field) - the late field is often generated by a series of delay lines whose inputs and outputs are connected by a mixing matrix (feedback delay network). Hybrid structures which use convolution to generate the direct-path reflections and an FDN to generate the late field which is equalized to sound right. To properly place the cross-fade, the “normalized echo density” (guassianness) of the impulse response is used. The cross-fade can also be made smoother by unrolling the FDN loop or by subtracting the unwanted portion of the FDN response from the convolutional response. Convolution+FDN can also be used to model plate reverbs (where sound waves reflect inside of a big metal plate). Comb filter structures need less memory; the coloration based on their periodicity can be partially removed by convolving with a noise sequence whose impulse response is changed dramatically over time (switched convolution). The noise sequence can be based on the “true” impulse response or velvet noise (sequence of 1, 0, or -1).

Old processors which mimic real-world reverberation have certain characteristic sounds which can also be useful to model. Spring reverbs create artificial reverberation by torsionally vibrating coupled springs, which propagate low frequencies faster than higher ones. The springs can be modeled with finite difference schemes based on helical coils or more efficiently with dispersive filters (bidirectional waveguides). The dispersive waveguide approach has also been used to model the slinky. Spring imperfections case a noise-like “wash”, which can be modeled by a time-varying waveguide delay. The Leslie speaker models the doppler effect by rotating a speaker horn in a reverberant chamber, which varies the timbre and shifts spectral components. Leslies have been modeled by a delay line with modulated input taps based on the reflection position and by generating an FIR filter dynamically whose taps are drawn from measured impulse responses. Tape delays model long echo patterns by recording and playing back magnetic vibrations on a tape; they are difficult to model thanks to the non-ideal tape transport mechanism and moveable record heads. Bucket brigade devices also model echo patterns by sampling and delaying the signal; the sampling and storage process produces nonlinearities which can be modeled digitally with filters, compression, and polynomial nonlinearities.

The characteristic sound of tube amplifiers is highly regarded, so ideally we can model it digitally (multiple instances in real-time). One technique involves modulating an IIR filter's coefficients based on its delayed output or a solution to the implicit nonlinearity. Ordinary differential equations can also be solved in real-time to model the entire amplifier, with nonlinearities modeled according to Koren's equations. Modeling coupling effects between amplifier stages is also important but hard to do efficiently in real-time. One attempt to model coupling involves pairing the cascaded triode stages separately. A state-space based amplifier simulation method involves automatically generating and solving models with discretized elements. This method can use look-up tables, provided that parameters are fixed. It can also be made more efficient by deriving the system equations manually, but then the model can't be generated automatically. Another state-space-based technique uses a dynamic triode model which includes the effect of capacitance between plate and grid. Wave digital filters have also been used for amplifier simulation, but considerations must be made for coping with multiple nonlinearities and global feedback, such as sacrificing modularity. One triode model inserts unit delays, which doesn't compromise modeling accuracy very much. This method is also used for the output (power amplifier, transformer, speaker). Some objective distortion analysis techniques have been presented. Typical methods include exponential sweep, dynamic intermodulation distortion analysis and complex spectral phase evolution.

The voltage controlled filter consists of a transistor-capacitor ladder with variable feedback via a high-impedance amplifier. the transistor bias current can be used to adjust the cutoff frequency. The transistors also introduce nonlinear effects. A simple linear model uses the transfer function <math>H(s) = \frac{1}{k+(1+s/\om_c)^4}</math>, where <math>\om_c</math> is the cutoff frequency and <math>k</math> is a gain parameter. Discretizing this function is difficult, but with a specific procedure (serializing four transfer functions) for computing the delay-free loop, <math>\om_c</math> and <math>k</math> can be kept decoupled. Modeling the nonlinear effects can be done with a number of techniques, often resulting in a nonlinear differential state-space representation for the whole system. Paradigms for modeling the nonlinearities involve the Voltera expansion of the nonlinearity and modeling the circuit; both require numerical integration. The approach can be simplified with a unit delay, whose resulting accuracies can be compensated for with polynomial correction functions for resonance and frequency based on the frequency. Each nonlinear element can be cheaply modeled with a tanh or cubic nonlinearity. When Volterra kernels are used, they must be adjusted to manage instability in the presence of large distortion. VCFs can also be modeled and solved as differential equations. No model has yet dealt with the interplay between the bias and input signal or the coupling of the feedback circuit and the RC ladder.

Early languages included the MUSIC series, with MUSIC IV introducing unit generators (modular synthesis components) and V providing a high-level programming environment. Most modern systems can operate in real-time and provide separate rates for audio generation and control. SuperCollider combines a language/interpreter with a synthesizer, with communication done over OSC. PureData is a graphical programming language where blocks (UGs) are connected to generate control and audio. CSound is a multilingual programming library with a text language for the creation of UGs, a compiler, and a score language. FAUST is a functional signal processing programming language with a modular approach, and can be compiled to many different plugin formats and audio processing software environments. Some languages have provisions for multiprocessor architectures.

recent_advances_in_real-time_musical_effects_synthesis_and_virtual_analog_models.txt · Last modified: 2015/12/17 21:59 (external edit)