Presents an instance-based learning approach for determining the control parameters for a physical trumpet model.
Any model of an acoustic instrument requires careful tuning of parameters in order to produce an accurate sound, and usually controlling those parameters manually is impossible. This estimation can treat the model as a black box and only observe the output for different inputs. Previous approaches use learning to control synthesis parameters.
Given a system which is controlled by a set of parameters which produces a signal, we wish to produce a set of parameters for a different system which produce a signal which sounds similar. The problem can be seen as a modeling of a multi-dimensional function which takes as input the “real” output signal and returns control parameters for the model. To determine whether two sounds are similar, the fundamental frequency and discrete cepstrum are used, which summarize the pitch and spectral envelope of the signal.
Modeling the function which maps a query signal to a set of parameters can be done using a parametric (funtional), non-parametric (eg k-NN), and semi-parametric (eg neutral net) approach. k-NN was used to choose the set of parameters which most closely generated a given signal. A branch and bound algorithm was used to speed up the number of distance computations. The synthesis model used here is a trumpet model which has three important control parameters: The pressure in the mouth, the frequency of the lips, the viscosity of the lips and the length of the tube. Two data sets were created: One which consisted of all chromatic notes played with all possible intensities, with and without vibrato, and another which consisted of sampling the entire parameter space. The training data set was constructed using the characteristics of the sound as input and the control parameters as output, and a branch and bound algorithm was developed to quickly search the set. The sound characteristics of a recorded trumpet were extracted once every 100ms, and parameter settings which yielded the closest sound were chosen and used to resynthesize a sound which is meant to be close to the recorded sound.
One difficulty faced was the fact that similar input sounds would return the same control parameters, particularly when the input sound was not close to any sound in the training set. In order to deal with the training set which only consisted of datapoints for notes on the chromatic scale, the pitch feature of the input audio was adjusted. When an input sound with amplitude or vibrato modulation was encountered, the system would suggest that the tube length be adjusted; this is impossible in the real world, so the tube length was chosen for each note, then left fixed and the other parameters were chosen. A further difficulty was the implicit assumption that the relationship between the control parameters and sound characteristics is time-independent; this resulted in the estimated parameters potentially varying a great deal for very similar input characteristics. Transients also proved to be difficult because the signal characteristics chosen were not meant to model them; this was treated by explicitly dealing with the transients. Iterative optimization or linear interpolation could be used to improve results.