# Converting 8.24 bit samples in CoreAudio on iOS When working with CoreAudio on iOS many of the sample applications use the iPhones canonical audio format which is 32 bit 8.24 fixed-point audio. This is because it is the hardwares ‘native’ format.

You end up with a buffer of fixed point data, which is a bit of a pain to deal with.

Other libraries and source code tend to work with floating point samples between 0 and +/-1.0 or signed 16 bit integer samples…so this fixed point stuff is a bit of a pain. You could force CoreAudio to give you 16 bit integer samples to start with (which means it does the conversion for you before giving you the audio buffer) or you could do the conversion yourself, as and when you need to. This can be a more efficient way of doing things, depending on your needs.

In this post I want to show you how you can convert the native 8.24 fixed point sample data into 16 bit integer and/or floating point sample data…and give you an explanation of how it works. But first, I need to de-mystify some stuff to do with bits and bytes.

Bit Order != Byte Order

In Objective-C you can think of the bits of a binary number going from left to right. Just as in base 10, the most significant digit is the left most digit

```128| 64| 32| 16| 8 | 4 | 2 | 1
-------------------------------
0 | 1 | 0 | 0 | 0 | 0 | 1 | 0
-------------------------------```

The above binary number may represent the integer 66. We can apply bit shift operations to binary numbers such that if I shifted all the bits right ( >>) by 1 place I would have:

```128| 64| 32| 16| 8 | 4 | 2 | 1
-------------------------------
0 | 0 | 1 | 0 | 0 | 0 | 0 | 1
-------------------------------```

This might represent an integer value of 33. The left most bit has been newly introduced or padded with 0.

So. When you are thinking about bits, and bit shifting operations, think left to right in terms of significance. Got that? Right, now lets move onto bytes.

The above examples dealt with a single byte (8 bits). When a multi-byte number is represented in a byte array, it can be either little endian or big endian. On Intel, and in terms of CoreAudio, little endian is used. This means the BYTE with the most significance has the highest memory address and the BYTE with the least significance has the lowest memory address (little-end-first = little-endian).

See this post for why this is important when dealing with raw sample data in CoreAudio, and this post on codeproject for a more in-depth explanation. The most important thing to realise is that Bit order and Byte order significance are different beasts. Dont get confused.

For the rest of this post, we are dealing with the representation of the binary digits from the perspective of the language, not the architecture. i.e Think in terms of bit order and not byte order.

Converting 8.24 bit samples to 16 bit integer samples

What does this mean? It means we are going to:

• Preserve the sign of the sample data (+/- bit)
• Throw away 8 bits of the 24 bit sample. We assume these bits contain extra precision that we just dont need or are not interested in.
• Be left with a signed 16 bit sample. A signed 16 bit integer can range from -32,768 to 32,767. This will be the resulting range of our sample.

Remember, we are thinking in terms of bit order; the most significant bit (or the ‘high order’ bit) is the left-most bit. Here is an example of a 32 bit (4 byte), 8.24 fixed point sample:

```  8 bits  |         24 bit sample
----------------------------------------------
11111111 | 01101010 | 00011101 | 11001011
----------------------------------------------```

In 8.24 fixed point samples, the first 8 bits represent the sign. They are either all 0 or all 1. The next 24 bits represent the sample data. We want to preserve the sign, but chuck away 8 bits of the sample data to give us a signed 16 bit integer sample.

The trick is to shift the bits 9 places to the right. It’s a crafty move. This is what happens to our 32 bits of data if we shift them right 9 places: 9 bits fall of the end, the sign bits get shunted up and the new bits get padded with zeros such that we get left with:

```  new bits  |sign bits|                         gone
------------------------------------------
00000000 | 0111111 | 10110101 | 00001110    111001011
------------------------------------------
|   first 16 bits    |```

We still have 32 bits of data with the bits shunted up. We are only interested in the first 16 bits of data (the right most bits) that now contain the most significant bits of the 24 bit sample data. A brilliant side effect is that the first (left-most) bit of the first 16 bits represent the sign!

By casting the resulting 32 bits to a 16 bit signed integer we take the first 16 bits, which are the bits we want, and we have a signed 16 bit sample that ranges from -32,768 to 32,767. If we want this as a floating point value between 0 and 1 we can now simply divide by 32,768. Walla.

The code is thus:

```SInt16 sampleInt16 = (SInt16)(originalSample >> 9);
float sampleFloat = sampleInt16 / 32768.0;```

Simple when you know how. And why!

# Understanding AurioTouch I have been playing around with CoreAudio on iOS of late. The trouble with media API’s is that they are necessarily complex and CoreAudio is no exception.

While trying to figure out how to read data coming from the microphone and visually render the samples to the screen, I came across the aurioTouch example provided by Apple. It looked great..until I tried to work out what the code was doing!

There are so many aspects of the code that I struggled to make sense of, from arbitrary scaling factors to the odd bit of inline assembly, but here I will mention just one. In hindsight, it doesn’t seem so obscure now. But hindsight is a wonderful thing.

After having obtained a buffer full of PCM audio data, the following code is used to fill an array of values that is used to draw the data:

``````SInt8 *data_ptr = (SInt8 *)(ioData->mBuffers.mData);
for (int i=0; i<numFrames; i++)
{
if ((i+drawBufferIdx) >= drawBufferLen)
{
cycleOscilloscopeLines();
drawBufferIdx = -i;
}

drawBuffers[i + drawBufferIdx] = data_ptr;
data_ptr += 4;
}``````

m_Buffers.mData contains an array of SInt32 values. These are PCM samples in 8.24 fixed-point format. This means that nominally, 8 bits of the 32 bits are used to contain the whole number part, and the remaining 24 bits are used to contain the fractional part.

I could not understand why the code was iterating through it using an SInt8 pointer and why, when the actual value was extracted, it was using data_ptr. i.e It was using the third byte of the 32 bit (4 byte) 8.24 fixed point sample and chucking away the rest. I was so confused that I turned to stackoverflow for help. The answer given is spot on the money…but perhaps not all that clear if you are an idiot like me.

After printing out the binary contents of each sample I finally understood.

The code is using an SInt8 pointer because, at the end of the day, it is only interested in a single byte of data in each sample. Once this byte of data has been extracted, data_ptr is advanced by 4 bytes to move it to the beginning of the next complete sample (32 bit, 8.24 fixed point format)

The reason it extracts data_ptr becomes apparent when you look at the binary. What I was failing to appreciate (a school boy error on my part) was that the samples are in little-endian format. This is what a typical sample might look like in memory:

```data_ptr     data_ptr     data_ptr    data_ptr
----------------------------------------------------------
01001100    |   01000000    |   11001111    |  11111111
----------------------------------------------------------```

The data is little-endian, meaning the least significant byte has the lowest memory address, and conversely, the most significant byte has the highest memory address. In CoreAudio 8.24 fixed point LPCM data, the first (most significant) 8 bits is used to indicate the sign. They are either all set to zero or all set to one. The sample code ignores this and looks at the most significant byte of the remaining 24 bits…which is data_ptr

It is safe to throw the rest away as it is of little consequence to the display of the signal; throwing the rest of the data away still gives you a ‘good enough’ representation of the sample.

Later on in the sample code (not shown above), this value is divided by 128 to give a value between -1 and 1. It is divided by 128 because an SInt8 can hold a value ranging from -128 to +127

Like I said, this is just one of many confusing bits of code in the sample app. CoreAudio is not for the feint hearted. If you are a novice, like me, then perhaps the aurioTouch sample is not the best place to start!