music_320

Notes from CCRMA's Music 320, “Introduction to Digital Audio Signal Processing”. More information is at the course website. These notes are from Fall 2009 and may or may not reflect future years; they're also in chronological order (as opposed to logical). Much of the content here likely follows pretty closely to what is in Mathematics of the Discrete Fourier Transform (DFT) and Introduction to Digital Filters.

Representing sound as a sequence of numbers, and manipulating, synthesizing, etc the sound based on numerical/mathematical operations. We can plot this value against time to see the time varying amplitude of the signal and find out various things, like the speed at which the sound decays or the distance between a speaker and microphone (impulse from speaker, look at time difference). In an impulse response, and take 20x the log of the absolute value of the signal you get decibels, which will show a linear, not exponential response (the rate at which energy is absorbed is constant). We can measure the t60 (time it takes to get to 60%) of 60ms, pretty typical. The point is that by looking at the numbers vs time you can tell a lot of things. Looking at the time variant signal you can't determine a lot about it, ie vs a music score. But looking at the frequency vs amplitude vs time (3d graph) you can see it more like a piece of music - based on the frequency components over a little window, via Fourier. In our ear after the sound vibrates our ear bones and enters our cochlea, it is broken up into higher, middle, and lower frequency parts in the same way.

There are lots of vibrating systems which vary linearly: the signal is delayed, amplified, derived, or integrated, and audio systems work like that. The frequency is not changing. By analyzing such a system, with sinusoids we can understand all there is to know about it.

Function of the form <math>A\sin(\om t+\phi)</math>. <math>\om</math> is omega, the frequency, in terms of radians per second. Imagine a penny spinning around on a turntable (with diameter <math>A</math>) - sine plots the height of the penny at any time. <math>\phi</math>, phi, is a time offset, or phase. Frequency in hertz is <math>\frac{\om}{2pi}</math>.

An exponential function is of the form <math>y = e^{-at}</math>, where it drops it's amplitude a certain percent every <math>a</math>. If we set <math>a = j\om</math>, it turns into a sine wave at frequency <math>w</math>! Exponentials are solutions to linear systems (and thus so are sinusoids) because they are still exponentials when you differentiate, integrate, delay, or amplify them.

Sine waves are really special because if you are able to emit a perfect sine wave everyone will hear a perfect sine wave - just perhaps louder or more quietly. With a more complex waveform the waveform can change depending on the room. But if the sine wave is perfect to begin with, the sine wave will remain in tact after it travels through the air, etc. So if we can break down a signal into sine waves this is very helpful. With Fourier you can use sine waves to construct different wave forms by adding together series of sine waves. To determine whether or not the signal correlates well we multiply the sine wave estimation by the actual signal at a certain sampling interval, and the sum of these multiplications. After we know the frequency content of a signal, we can manipulate (filter) it.

Think of a bunch of keys (or any weight) on the end of a string. When moving your hand really slowly, the response of the keys vs the movement of your hand is basically the same. As you move faster, the keys move more and more, until a certain critical frequency. After that, the move less and less, until they basically move not at all. This is a resonant lowpass filter, because it's 1:1 at lower frequencies, with a resonant peak, then it tapers down. At low frequencies, the phase is also basically 1:1. EG, there is no time delay between the input and output. At the resonant peak, it's about 90 degrees out of phase, and at high frequencies, it's completely out of phase.

The DFT takes a sequence of N samples, just a set of <math>N</math> points (<math>N</math>, not infinite), each with some value <math>x(n)</math> at each point. We then get out <math>X(n)</math> which gives the frequency content at each point.

<math>X(k) = \sum_{n=0}^{N-1}{x(n)e^{\frac{-j\pi kn}{N}} } </math>

The <math>e</math> component are sinusoids at different frequencies. This sum is seeing how well the signal lines up with this frequency. The inverse DFT takes <math>X(k)</math> and gets back to <math>x(n)</math>.

Given an equation <math>p(x) = ax + b</math>, <math>p(x) = 0</math> when <math>x = -\frac{b}{a}</math>. Say we have a parabola <math>y = x^2 - 1</math>, and we want to know when <math>y = 0</math>: <math>0 = x^2 - 1</math> or <math>x^2 = 1</math> or <math>x = \pm 1</math>. But if we have <math>y = x^2 + 1</math>, and we want to know when it's 0, we get <math>-1 = x^2</math>. Rather than give up, we name this quantity for <math>\sqrt{-1} = j</math>, so <math>x = \pm j</math>. We can solve quadratics in general with

<math>x = \frac{-b \pm \sqrt{b^{2}-4ac} }{2a} </math>

Then we have <math>p(x) = a(x-p+)(x-p-)</math> where p+ is the formula with a +, and <math>p-</math> is with a -. Given a square root of a negative number <math>z</math>, we can say

<math>\sqrt{-z} = \sqrt{z} j</math>

With <math>b^2 > 4ac</math>, we have only real roots, while if <math>b^2 < 4ac</math>, we have two imaginary roots, one with a positive imaginary value and one with a negative. This is equivalent to whether the parabola is above or below the x axis. According to the fundamental theorem of algebra, for any polynomial (any order) we only need complex numbers to describe the roots. You don't necessarily have a nice form though, like the quadratic formula.

We can define a complex number <math>z = x + jy</math>, where <math>x</math> and <math>y</math> are real. <math>x</math> is referred to as the real part and can be written as <math>Re(z)</math>, <math>y</math> is the imaginary part and is written <math>Im(z)</math>. We can plot this on a two dimensional plane with <math>x</math> as the real axis and <math>y</math> as the imaginary axis. <math>z_1 + z_2 = (x_1 + x_2) + j(y_1 + y_2)</math> and

<math>z_{1}z_{2} = x_{1}x_{2}-y_{1}y_{2} + j({x_{1}y_{2} + y_{1}x_{2})</math>

Define the length of the line in the complex plane as the magnitude, absolute value, or modulus <math>r</math>, where

<math>r = \sqrt{ x^{2} + y^{2}}</math>

and the angle of the line as the phase theta as

<math>\theta = tan^{-1}(\frac{y}{x})</math>

We can subsequently have

<math>y = rsin(\theta)</math> and <math>x=rcos(\theta)</math>

leading to Euler's formula

<math>e^{j\theta} = cos(\theta) + jsin(\theta)</math>

which allows us to write the complex number <math>z</math> as

<math>z = re^{j\theta}</math>.

Addition and multiplication check out in the predictable way. We also define the complex conjugate of <math>z</math> as

<math>\bar{z}=x - jy\:\:where\:\: z = x+jy</math>

or in “Euler's form”,

<math>\bar{z} = e^{-j\theta}</math>

This leads to

<math>z\bar{z} = r^{2}</math>

which is equivalent to

<math>z\bar{z} = x^2-y^2</math>

If you multiply the two roots of a order-two polynomial, you'll always find that it is <math>\frac{c}{a}</math>.

music_320.txt · Last modified: 2015/12/17 21:59 (external edit)