Describes the design and usage of the LibXtract library, which extracts a variety of low-level features from audio signals.
An audio feature is any measurable aspect of a sound, which may include melodic shape, rhythm, texture and timbre. Most feature extraction libraries focus on numerically quantifiable features. Existing libraries include Aubio, which extracts pitch, beat, and onsets; jAudio, which is a customizable and optimized feature extractor written in Java; CLAM, which is a larger framework for audio applications including feature extraction; and Marsyas and Maate which also provide additional functionality like file/audio i/o in addition to feature extraction.
LibXtract intends to be able to efficiently extract all of the MPEG-7 and Cuidado features. Extraction is done in a hierarchical manner, so that processes are not duplicated and new features can be created easily from building blocks. Calls to LibXtract use an array of function pointers which take an array of input data, its size, an argument vector (dependent on each feature), and an output sample array as arguments. Features may return a scalar, a vector, or be time-based (delta features). The input data may be time-domain audio samples, spectral data, or something else, depending on the feature extractor. FFTW is used for FFTs.
Programs using LibXtract include Sonic Visualizer's Vamp plugin API, which provides a UI for extracting and visualizing information in audio signals; Pure Data and Max/MSP, which allows for real-time feature extraction in audio processing and analysis patches; and SuperCollider, which has individual unit generators for LibXtract's features.
LibXtract is meant for low-level features, but these features can serve as a basis for higher-level features like roughness, sharpness, loudness, instrument, and “danceability” by mapping the low-level features using a (typically machine learning-based) dimensionality reduction. This mapping can be done in real-time in the context of compositions, to apply processing based on high-level features. The high-level classification can also be a mapping to a multiple-dimension, continuous feature space.
LibXtract is fairly computationally efficient depending on the feature extraction task. The modular nature can make things slower if only one feature requires many steps to be calculated.