Converting 8.24 bit samples in CoreAudio on iOS

When working with CoreAudio on iOS many of the sample applications use the iPhones canonical audio format which is 32 bit 8.24 fixed-point audio. This is because it is the hardwares ‘native’ format.

You end up with a buffer of fixed point data, which is a bit of a pain to deal with.

Other libraries and source code tend to work with floating point samples between 0 and +/-1.0 or signed 16 bit integer samples…so this fixed point stuff is a bit of a pain. You could force CoreAudio to give you 16 bit integer samples to start with (which means it does the conversion for you before giving you the audio buffer) or you could do the conversion yourself, as and when you need to. This can be a more efficient way of doing things, depending on your needs.

In this post I want to show you how you can convert the native 8.24 fixed point sample data into 16 bit integer and/or floating point sample data…and give you an explanation of how it works. But first, I need to de-mystify some stuff to do with bits and bytes.

Bit Order != Byte Order

In Objective-C you can think of the bits of a binary number going from left to right. Just as in base 10, the most significant digit is the left most digit

128| 64| 32| 16| 8 | 4 | 2 | 1
-------------------------------
 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0
-------------------------------

The above binary number may represent the integer 66. We can apply bit shift operations to binary numbers such that if I shifted all the bits right ( >>) by 1 place I would have:

128| 64| 32| 16| 8 | 4 | 2 | 1
-------------------------------
 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1
-------------------------------

This might represent an integer value of 33. The left most bit has been newly introduced or padded with 0.

So. When you are thinking about bits, and bit shifting operations, think left to right in terms of significance. Got that? Right, now lets move onto bytes.

The above examples dealt with a single byte (8 bits). When a multi-byte number is represented in a byte array, it can be either little endian or big endian. On Intel, and in terms of CoreAudio, little endian is used. This means the BYTE with the most significance has the highest memory address and the BYTE with the least significance has the lowest memory address (little-end-first = little-endian).

See this post for why this is important when dealing with raw sample data in CoreAudio, and this post on codeproject for a more in-depth explanation. The most important thing to realise is that Bit order and Byte order significance are different beasts. Dont get confused.

For the rest of this post, we are dealing with the representation of the binary digits from the perspective of the language, not the architecture. i.e Think in terms of bit order and not byte order.

Converting 8.24 bit samples to 16 bit integer samples

What does this mean? It means we are going to:

  • Preserve the sign of the sample data (+/- bit)
  • Throw away 8 bits of the 24 bit sample. We assume these bits contain extra precision that we just dont need or are not interested in.
  • Be left with a signed 16 bit sample. A signed 16 bit integer can range from -32,768 to 32,767. This will be the resulting range of our sample.

Remember, we are thinking in terms of bit order; the most significant bit (or the ‘high order’ bit) is the left-most bit. Here is an example of a 32 bit (4 byte), 8.24 fixed point sample:

  8 bits  |         24 bit sample
----------------------------------------------
 11111111 | 01101010 | 00011101 | 11001011
----------------------------------------------

In 8.24 fixed point samples, the first 8 bits represent the sign. They are either all 0 or all 1. The next 24 bits represent the sample data. We want to preserve the sign, but chuck away 8 bits of the sample data to give us a signed 16 bit integer sample.

The trick is to shift the bits 9 places to the right. It’s a crafty move. This is what happens to our 32 bits of data if we shift them right 9 places: 9 bits fall of the end, the sign bits get shunted up and the new bits get padded with zeros such that we get left with:

  new bits  |sign bits|                         gone
------------------------------------------
 00000000 | 0111111 | 10110101 | 00001110    111001011
------------------------------------------
                    |   first 16 bits    |

We still have 32 bits of data with the bits shunted up. We are only interested in the first 16 bits of data (the right most bits) that now contain the most significant bits of the 24 bit sample data. A brilliant side effect is that the first (left-most) bit of the first 16 bits represent the sign!

By casting the resulting 32 bits to a 16 bit signed integer we take the first 16 bits, which are the bits we want, and we have a signed 16 bit sample that ranges from -32,768 to 32,767. If we want this as a floating point value between 0 and 1 we can now simply divide by 32,768. Walla.

The code is thus:

SInt16 sampleInt16 = (SInt16)(originalSample >> 9);
float sampleFloat = sampleInt16 / 32768.0;

Simple when you know how. And why!