Understanding AurioTouch

I have been playing around with CoreAudio on iOS of late. The trouble with media API’s is that they are necessarily complex and CoreAudio is no exception.

While trying to figure out how to read data coming from the microphone and visually render the samples to the screen, I came across the aurioTouch example provided by Apple. It looked great..until I tried to work out what the code was doing!

There are so many aspects of the code that I struggled to make sense of, from arbitrary scaling factors to the odd bit of inline assembly, but here I will mention just one. In hindsight, it doesn’t seem so obscure now. But hindsight is a wonderful thing.

After having obtained a buffer full of PCM audio data, the following code is used to fill an array of values that is used to draw the data:

SInt8 *data_ptr = (SInt8 *)(ioData->mBuffers[0].mData);
for (int i=0; i<numFrames; i++)
    if ((i+drawBufferIdx) >= drawBufferLen)
        drawBufferIdx = -i;

    drawBuffers[0][i + drawBufferIdx] = data_ptr[2];
    data_ptr += 4;

m_Buffers[0].mData contains an array of SInt32 values. These are PCM samples in 8.24 fixed-point format. This means that nominally, 8 bits of the 32 bits are used to contain the whole number part, and the remaining 24 bits are used to contain the fractional part.

I could not understand why the code was iterating through it using an SInt8 pointer and why, when the actual value was extracted, it was using data_ptr[2]. i.e It was using the third byte of the 32 bit (4 byte) 8.24 fixed point sample and chucking away the rest. I was so confused that I turned to stackoverflow for help. The answer given is spot on the money…but perhaps not all that clear if you are an idiot like me.

After printing out the binary contents of each sample I finally understood.

The code is using an SInt8 pointer because, at the end of the day, it is only interested in a single byte of data in each sample. Once this byte of data has been extracted, data_ptr is advanced by 4 bytes to move it to the beginning of the next complete sample (32 bit, 8.24 fixed point format)

The reason it extracts data_ptr[2] becomes apparent when you look at the binary. What I was failing to appreciate (a school boy error on my part) was that the samples are in little-endian format. This is what a typical sample might look like in memory:

data_ptr[0]     data_ptr[1]     data_ptr[2]    data_ptr[3]
 01001100    |   01000000    |   11001111    |  11111111

The data is little-endian, meaning the least significant byte has the lowest memory address, and conversely, the most significant byte has the highest memory address. In CoreAudio 8.24 fixed point LPCM data, the first (most significant) 8 bits is used to indicate the sign. They are either all set to zero or all set to one. The sample code ignores this and looks at the most significant byte of the remaining 24 bits…which is data_ptr[2]

It is safe to throw the rest away as it is of little consequence to the display of the signal; throwing the rest of the data away still gives you a ‘good enough’ representation of the sample.

Later on in the sample code (not shown above), this value is divided by 128 to give a value between -1 and 1. It is divided by 128 because an SInt8 can hold a value ranging from -128 to +127

Like I said, this is just one of many confusing bits of code in the sample app. CoreAudio is not for the feint hearted. If you are a novice, like me, then perhaps the aurioTouch sample is not the best place to start!


  • Tester

    if aurioTouch is not the best place to start then where do i have to start implementing the real time audio frequency graph..any comments?

    • Thomas Momas

      Apple’s new API AVAudioEngine provides some real-time audio capability coupled with a few basic effects such as reverb and distortion however if your audio processing requires anything more complex such as pitch-shifting, it looks we are still limited to the low-level APIs :(