Converting 8.24 bit samples in CoreAudio on iOS

When working with CoreAudio on iOS many of the sample applications use the iPhones canonical audio format which is 32 bit 8.24 fixed-point audio. This is because it is the hardwares ‘native’ format.

You end up with a buffer of fixed point data, which is a bit of a pain to deal with.

Other libraries and source code tend to work with floating point samples between 0 and +/-1.0 or signed 16 bit integer samples…so this fixed point stuff is a bit of a pain. You could force CoreAudio to give you 16 bit integer samples to start with (which means it does the conversion for you before giving you the audio buffer) or you could do the conversion yourself, as and when you need to. This can be a more efficient way of doing things, depending on your needs.

In this post I want to show you how you can convert the native 8.24 fixed point sample data into 16 bit integer and/or floating point sample data…and give you an explanation of how it works. But first, I need to de-mystify some stuff to do with bits and bytes.

Bit Order != Byte Order

In Objective-C you can think of the bits of a binary number going from left to right. Just as in base 10, the most significant digit is the left most digit

128| 64| 32| 16| 8 | 4 | 2 | 1
-------------------------------
 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0
-------------------------------

The above binary number may represent the integer 66. We can apply bit shift operations to binary numbers such that if I shifted all the bits right ( >>) by 1 place I would have:

128| 64| 32| 16| 8 | 4 | 2 | 1
-------------------------------
 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1
-------------------------------

This might represent an integer value of 33. The left most bit has been newly introduced or padded with 0.

So. When you are thinking about bits, and bit shifting operations, think left to right in terms of significance. Got that? Right, now lets move onto bytes.

The above examples dealt with a single byte (8 bits). When a multi-byte number is represented in a byte array, it can be either little endian or big endian. On Intel, and in terms of CoreAudio, little endian is used. This means the BYTE with the most significance has the highest memory address and the BYTE with the least significance has the lowest memory address (little-end-first = little-endian).

See this post for why this is important when dealing with raw sample data in CoreAudio, and this post on codeproject for a more in-depth explanation. The most important thing to realise is that Bit order and Byte order significance are different beasts. Dont get confused.

For the rest of this post, we are dealing with the representation of the binary digits from the perspective of the language, not the architecture. i.e Think in terms of bit order and not byte order.

Converting 8.24 bit samples to 16 bit integer samples

What does this mean? It means we are going to:

  • Preserve the sign of the sample data (+/- bit)
  • Throw away 8 bits of the 24 bit sample. We assume these bits contain extra precision that we just dont need or are not interested in.
  • Be left with a signed 16 bit sample. A signed 16 bit integer can range from -32,768 to 32,767. This will be the resulting range of our sample.

Remember, we are thinking in terms of bit order; the most significant bit (or the ‘high order’ bit) is the left-most bit. Here is an example of a 32 bit (4 byte), 8.24 fixed point sample:

  8 bits  |         24 bit sample
----------------------------------------------
 11111111 | 01101010 | 00011101 | 11001011
----------------------------------------------

In 8.24 fixed point samples, the first 8 bits represent the sign. They are either all 0 or all 1. The next 24 bits represent the sample data. We want to preserve the sign, but chuck away 8 bits of the sample data to give us a signed 16 bit integer sample.

The trick is to shift the bits 9 places to the right. It’s a crafty move. This is what happens to our 32 bits of data if we shift them right 9 places: 9 bits fall of the end, the sign bits get shunted up and the new bits get padded with zeros such that we get left with:

  new bits  |sign bits|                         gone
------------------------------------------
 00000000 | 0111111 | 10110101 | 00001110    111001011
------------------------------------------
                    |   first 16 bits    |

We still have 32 bits of data with the bits shunted up. We are only interested in the first 16 bits of data (the right most bits) that now contain the most significant bits of the 24 bit sample data. A brilliant side effect is that the first (left-most) bit of the first 16 bits represent the sign!

By casting the resulting 32 bits to a 16 bit signed integer we take the first 16 bits, which are the bits we want, and we have a signed 16 bit sample that ranges from -32,768 to 32,767. If we want this as a floating point value between 0 and 1 we can now simply divide by 32,768. Walla.

The code is thus:

SInt16 sampleInt16 = (SInt16)(originalSample >> 9);
float sampleFloat = sampleInt16 / 32768.0;

Simple when you know how. And why!

 

 

  • http://people.virginia.edu/~chg5w Chris Gregg

    Hi there –
      I’ve been working my way through the aurioTouch app myself, and I stumbled upon your StackExchange post about the 8.24 bit LPCM.  Then I made it to your post here.  I was curious by what you meant by “In 8.24 fixed point samples, the first 8 bits represent the sign.”  I did some more digging, and what you wrote isn’t actually correct — 8.24 fixed point means that the first 8 bits denote the integer portion of the number (between -128 and 127), and the remaining 24 bits denote the fractional portion of the number.  (see here: http://lists.apple.com/archives/coreaudio-api/2011/Feb/msg00083.html).

    The reason you only see all 1s or all 0s in the most significant byte is because the iPhone only produces audio within a small range of the full 32-bit values, from -1 to +1 (generally — it can go to a bigger range, and some of my values are indeed -2 for the integer portion).  All zeros in two’s complement is 0, and all 1s in two’s complement is -1.  So, you’re really only getting values from -1 to +1 for your audio output, because you add either 0 or -1 to the result of the 24-bit fractional part (in base 2…), which is always just that — a fraction.  But, as I mentioned above, you can’t be guaranteed that this is the case, as the iPhone will produce values that go beyond the -1 to 1 range.  Cheers!

    • http://people.virginia.edu/~chg5w Chris Gregg

      Blah — the link didn’t come out.  Here it is:

      http://lists.apple.com/archives/coreaudio-api/2011/Feb/msg00083.html

    • http://www.kevatron.co.uk/ Kevin Smith

      Thank you for this. You are right. I did come across that post myself but I dont think I fully appreciated what it was saying at the time.

      While the iPhone will produce values greater that +/-1.0 I believe these values are considered to be be clipped and should be scaled down accordingly prior to further processing? For non-clipped values, the conversion to 16bit samples will work as expected. Alas, it seems like this trick is not guaranteed to work for samples with an integer part greater than 1.

      • http://people.virginia.edu/~chg5w Chris Gregg

        Thanks Kevin (great blog, btw).  I finally figured out how to extract the correct PWM data from the 32-bit value: add the value of the most significant byte (the integer value, which is generally 0 or -1, sometimes -2 or 1) to the quantity of the unsigned value of the next three bytes divided by 2^24:
        MSB + ((next three bytes) / 2^24)

        The tricky part is seeing that the MSB is a two’s complement number, and the 24-bit fraction is not.  If you get the 32-bit unsigned int value, you can parse it as follows:

        unsigned int audioVal = *(unsigned int *)(data_ptr);
        float pwmValue = (SInt8)(audioVal >> 24)+(audioVal & 0xFFFFFF)/(float)(1<<24);Cheers!

        • http://www.kevatron.co.uk/ Kevin Smith

          Thanks. Yes, that makes sense…I think :)

          To be clear:

          - The whole number part is obtained with (SInt8)(audioVal >> 24)

          - The last 24 bits is extracted with (audioVal & 0xFFFFFF)

          - The floating point value of the 24 bits is obtained by dividing by the max number 24 bits can represent at full range (i.e not-twos complement). This is 2^24, which can be calculated by a shift operation: 1<<24

          You then add the two values.

          If you are doing this in a loop then I guess its best to calculate 1.0/(1<<24) and multiply by this value as it is presumably faster?. So you would have:

          float N = 1.0/(float)(1<> 24)+(audioVal & 0xFFFFFF)*N;

          Did I get that right?

  • lp

    Hi thanks for this. I’m trying to convert from 32-bits integer non-interleaved to 8.24-bits integer non-interleaved and vice versa. Going by the logic being taught here, would the conversion to be & the 8.24 integer with 10000000|11111111|11111111|11111111 ?

  • Chris

    Thanks a ton for this article. Could not make heads or tails of the format before.

  • matt.a

    Beautiful explanation of the 8.24 bit format, and a super clear explanation of the conversion process!