Skip to content

Audio programming for beginners

Sound is just an oscillation of some medium like air or a body. For example the membrane of a speaker vibrates because of a magnetic field resulting from a current through the driver coil. This sets the surrounding air into motion. The wiggling is then carried through the air and for example makes your eardrums vibrate. Audio waves need to oscillate at a frequency from 20 Hz to 20 kHz to be audible to the human being.

The oscillation of the medium is a function of time f (t) in the mathematical sense. For audio processing, this function needs to be recorded - i.e. with a microphone. It converts the air vibrations into oscillations of an electrical voltage which again is somehow stored on a magnetic band or similar. This would be the analog way and it does not need to care too much on resolution. But to store the signal digitally, it needs to be sampled. While the real representation of the signal has practically infinite resolution, the hard drive of a computer only has finite storage. Therefore, the signal is stored with a certain resolution which is called the sample rate. Common values are i.e. 44100 Hz or 48000 Hz, meaning that per 1 second of audio material 44100 values are stored.

This (now discrete) signal can be stored in an array structure. Each value is called one sample of the signal.

When playing the audio from the disc to the computer speakers, the data needs to be read and then fed (often through the OS) to the audio driver (we will not care about the audio driver itself here). This needs to be done in realtime meaning that once we started playing the audio file, we need to send a constant stream of data which cannot be interrupted as this would result in an audio dropout.

The first rule of audio programming is, that ONE MUST NOT CAUSE AN AUDIO DROPOUT!

Unlike in visuals (no one ever notices one or even a few frames drop in a game), each SINGLE sample is important. This is partly because the speaker membrane follows the signal and as suddenly as possible moves to the undefined value of the dropped sample. This causes a loud crack which can damage the speakers as well as the hearing of the present people.

The audio info needs to be transmitted through several interfaces in a computer to be played back. To invoke all callbacks for each single sample is highly inefficient and therefore several samples are bundled up in a buffer of i.e. 64-1024 samples. Each processing station can process all samples in the buffer at once and pass it on to the next station. The buffer should not be chosen too large though because this creates a latency. For 1024 samples per buffer this might be 1024/44100 Hz = 0,023 s = 23 ms. This is ok for playing back and even producing or mixing music but probably too slow for i.e. a guitar amp simulation that's played in realtime. The time until the first buffer (and all next buffers) is ready and played to the musician is noticably delayed compared to the playing of the note on the guitar. All in all, a balance has to be found. The average plugin developer does not need to worry too much about this, because the host DAW should deal with these issues mostly. Just keep in mind that there is a buffer and that it has a certain size which might change.