User Tools

Site Tools


handling_blocking_for_real-time_spectral_processing

Handling Blocking for Real-Time Spectral Processing

This article describes how to do spectral processing with a callback function, where the buffer size, window size, and hop size are all unknown/variable.

How it's done non-real-time

In Python with Numpy (or basically equivalently in MATLAB or Octave), you might do something like

# Assume audioSignal is a 1d array of floats representing an audio signal at some sampling rate
import numpy as np
# Window/FFT size
M = 1024
# Hop size
R = 512
# Create window
W = np.hanning(M) 
# Number of frames
nFrames = np.floor((audioSignal.shape[0]-M)/R)
# Output signal
outputSignal = np.zeros(audioSignal.shape[0])
for n in np.arange(nFrames):
  # Extract frame
  frame = W*audioSignal[n*R:n*R+M]
  # Take FFT
  Frame = np.fft.rfft( W*audioSignal[n*R:n*R+M] )
  # Do some processing on Frame
  # Output it
  outputSignal[n*R:n*R+M] = outputSignal[n*R:n*R+M] + np.real(np.fft.irfft(Frame))  

Note that it doesn't really matter what R and M are, as long as the window W is approximately constant overlap-add for those values. Also, depending on the spectral processing, the result of the IFFT might need to be windowed too (meaning you'd want to use a window whose square is COLA for R/M). At any rate, the point is that it's fairly easy to do if you have access to the entire audio file.

How it should be done in real-time

In the case of an audio plugin, for example, you generally don't have control over the size of the buffer (call it N) which is passed to the plugin. So, we can't assume that N > M, N > R, N < M, N < R, N % M = 0, or N % M = 0, and we can't even guarantee that N is fixed. So, the spectral processor has to be able to take in an arbitrary number of samples, do processing on them, and return the appropriate number of samples. In pseudo-C-code, this might look like:

// Global or instance variables for frames going into and returning from the spectral processor
float* spectralInputBuffer[M] = {0.0};
float* spectralOutputBuffer[M] = {0.0};
// Where we are in terms of storing data in the input/output buffers
int inputBufferLocation = 0;
// Need two different locations for the output buffer
int outputBufferWriteLocation = 0;
int outputBufferReadLocation = 0;
// The callback (assuming it's just one channel)
void callback( float callbackBuffer[], int callbackBufferSize )
{
  // Loop through the input buffer
  for( int n = 0; n < callbackBufferSize; n++ )
  {
    // Store the first sample
    spectralInputBuffer[inputBufferLocation] = callbackBuffer[n];
    // Increment our store location
    inputBufferLocation++;
 
    // Store the current output sample - note that this needs to be done BEFORE any spectral processing/OLA, to avoid overlap errors.
    callbackBuffer[n] = spectralOutputBuffer[outputBufferReadLocation];
    // Need to zero-out read values so that OLA works
    spectralOutputBuffer[outputBufferReadLocation] = 0.0;
    // Increment read location
    outputBufferReadLocation++;
    // Reset if necessary
    if ( outputBufferReadLocation == M )
    {
      outputBufferReadLocation = 0;
    }
 
    // If we have a whole frame of samples, do processing
    if (inputBufferLocation == M)
    {
      // Where we're going to output the processing - can't be the same as the actual output buffer
      float* tempSpectralOutputBuffer;
      // This is some function which does the spectral processing - the result is written to the second param, here spectralOutputBuffer
      doSpectralProcessing( spectralInputBuffer, tempSpectralOutputBuffer, M );
      // Overlap-add into the output buffer
      for (int m = 0; m < M; m++)
      {
        spectralOutputBuffer[outputBufferWriteLocation] += tempSpectralOutputBuffer[m];
        outputBufferWriteLocation++;
        if (outputBufferWriteLocation == M)
        {
          outputBufferWriteLocation = 0;
        }
      }
      // Now we need to shift the spectral input buffer by a "hop"
      for (int m = 0; m < M-R; m++)
      {
        spectralInputBuffer[m] = spectralInputBuffer[m+R];
      }
      // Also, need to "hop back" the input and output buffer locations
      inputBufferLocation -= R;
      outputBufferWriteLocation -= (M-R);
      // This can certainly make outputBufferLocation < 0, which would be bad!
      while (outputBufferLocation < 0 )
      {
        // Wrap if necessary
        outputBufferWriteLocation += M;
      }
    }
  }
}

Example scenario 1: N < R < M, N % R = N % M = 0

As an example, say M = 1024, R = 256, and N = 128. After 8 callback buffers are received, the spectralInputBuffer will be filled with 1024 samples, and doSpectralProcessing will be called. Its output will copied directly into spectralOutputBuffer, because tempSpectralOutputBuffer is added in to the initially all-zero array. Then, the most recent M-R=768 samples in spectralInputBuffer are moved to the beginning and the write location is moved to R samples before the end. In this way, once R more samples are received, the buffer will be full again, and will be full of the 1024 most recent samples. The writing pointer for spectralOutputBuffer is moved to M-(M-R) = 256. In this way, as the next R samples needed for a full spectralInputBuffer are received, R samples from spectralOutputBuffer are read and zeroed out. Once spectralInputBuffer is full again, doSpectralProcessing is called again and the output is added into spectralOutputBuffer, wrapping around and filling in the 256 samples at the beginning that have been zeroed out. This whole process repeats every time two callback buffers (=128*2=256) are received.

Example scenario 2: N > M, N % M ≠ 0

Now suppose M and N are 1024 and 256 as above, but N = 13527. Now, each time a callback buffer is received, inputBufferLocation will reach M multiple times before the end of the buffer. Each time this happens, doSpectralProcessing will be called and inputBufferLocation will be moved back R samples, so that R more samples from the callback buffer are read before doSpectralProcessing is called again. All of the resulting frames from doSpectralProcessing will be overlapped and added into spectralOutputBuffer as above. Eventually, the number of samples read in from the callback buffer will be less than 256 - say, 215 samples remain. These last 215 samples will be copied into spectralInputBuffer, and the next time the callback occurs, doSpectralProcessing will be called after 256-215 = 41 samples have been read.

Example scenario 3: N = M, R = 1

This is mostly a silly example because it would be incredibly computationally expensive (even though it's COLA for all windows) and when this is the case a lot of the code above isn't needed, but it's worth testing. On the first callback, the entire callback buffer will be copied into spectralInputBuffer. When the final sample is copied, doSpectralProcessing will be called and the result will be stored outright in spectralOutputBuffer, but inputBufferLocation would be set to M-1 and outputBufferWriteLocation would be set to 1. In this way, once the next callback happens, doSpectralProcessing will be called once for each sample that is read in because inputBufferLocation will consistently be M-1. outputBufferWriteLocation will basically increment M times, only to have (M-1) subtracted, such that each time an input sample is grabbed from the callback buffer, it increments by 1.

Some notes/caveats

This code assumes that the windowing is being done by doSpectralProcessing. Make sure it's ~COLA for the R and M you're using. Also, whenever N changes, all of the instantiation code should probably be run again (setting the read/write locations to 0 and zeroing out the buffers). Actual changes to N probably shouldn't happen very often at all, but the code needs to be able to run in a variety of different callback environments.

handling_blocking_for_real-time_spectral_processing.txt · Last modified: 2015/12/17 21:59 (external edit)