# 7 / Text / Computing Trailing Zeros HOWTOA bit-twiddling hack

Given the binary representation of some integer, a typical problem is to determine the number of trailing zeros. This HOWTO will show you how to accomplish the task in constant time. The reader is assumed to have experience with the C programming language, though the methods presented throughout this document can easily be applied to other languages as well.

Citing from the README of a mathematical subroutine package by R. Freund:

For all intent and purpose, any description of what the codes are doing should be construed as being a note of what we thought the codes did on our machine on a particular Tuesday of last year. If you're really lucky, they might do the same for you someday. Then again, do you really feel *that* lucky?

## Background Knowledge

Computing the number of trailing zeros in constant time is sweet and simple, but in order to understand the trick, we will have to look at some math first. This may sound scary, but all we have to do is understand the notion of a so called de Bruijn sequence. Once we know about de Bruijn sequences, we'll have a short reminder of Two's Complement, and off we go...

### de Bruijn sequences

As we are used to do from math classes, let's start with a definition:

Definition: A de Bruijn sequence of length n = 2k is a bitstring consisting of n digits such that every 0-1-sequence of length k occurs exactly once.

What? Let's have an example: a binary number of length 8 = 23 is a de Bruijn sequence, if it contains every possible bitstring of length 3, i.e. 000, 001, 010, 011, and so on. If you try to find such a de Bruijn sequence, you'll notice that our 8 digit binary has only room for 6 bitstrings of length 3, even if the bitstrings are allowed to overlap. So if you reach the end of a de Bruijn sequence, simply restart from the beginning. That's why they are called sequences and not just numbers.

Have you already found some de Bruijn sequence of length 8 (i.e., 23)? Of course you have. Here's another one: 00010111. Don't believe me? I'll show you:

```00010111
000             0
001            1
010           2
101          5
011         3
111        7
110       6
100      4
```

Look at how the bitstrings overlap and wrap around at the end. If you match them together, you'll get our original 00010111, or as we're used to call it: 23.

Now, in order to compute the number of trailing zeros of a binary, we need a de Bruijn sequence of exactly the size of our binary, i.e. CHAR_BITS*sizeof(int). The value of this expression may vary from system to system, and who says that we shall restrict ourselves to `int`s anyway, but for the purpose of this HOWTO, let's simply assume that this is 32. So how do we find a de Bruijn sequence of length 32? After some scribbling on the cover of "Math in 21 days", I ended up with 0x077cb531, which is 00000111 01111100 10110101 00110001 in binary representation. As specified above, its length is exactly 32=25, so this time it even contains all possible 5-tuples.

Just for fun, let's write down the offset for each 5-tuple in the sequence:

5-tupledecimal valueoffset
`00000`000
`00001`011
`00010`0228
`00011`032
`00100`0429
`00101`0514
`00110`0624
`00111`073
`01000`0830
`01001`0922
`01010`1020
`01011`1115
`01100`1225
`01101`1317
`01110`144
`01111`158
`10000`1631
`10001`1727
`10010`1813
`10011`1923
`10100`2021
`10101`2119
`10110`2216
`10111`237
`11000`2426
`11001`2512
`11010`2618
`11011`276
`11100`2811
`11101`295
`11110`3010
`11111`319

The table tells us that e.g. the 5-tuple `11010` (decimal 26) appears at the 18th position in the binary representation of our de Bruijn sequence. We will need this table later, but let's forget about math for now and learn some fundamental facts about our computers.

### Two's Complement

As we're programmers, I assume that you are already familiar with Two's Complement (if not, see http://en.wikipedia.org/wiki/Two%27s_complement). Actually, I just want to show you a nice trick. This would probably have been a better name for this section, but frankly speaking, it doesn't sound as important.

So. Have you ever wondered how to set all bits of an `int` to zero, except for the last 1 that appears in its binary representation? Well, neither did I, but we need it in order to compute the number of trailing zeros. But before I tell you how to do it, let's first play around a bit. Remember how to change the sign in Two's Complement? We have to do bitwise negation and add 1. Let's do a simple example: changing the sign of 12.

```00001100       | that's a binary 12
11110011       | bitwise negation
11110100       | add 1, get -12
```

Start with -12 and repeat this procedure, and you'll end up with 12 again. Astonishing, isn't it? Now look at the binary representations of 12 and -12 respectively. You'll notice that they have the same number of trailing zeros. Aside from that, they differ in every digit, except for the last 1. And what's more important, this is also true for every other Two's Complement number .

Now it's pretty clear how to isolate the last 1. Simply use bitwise AND on an `int x` and its bitwise negation:

`x & -x`

Too simple. The result is a number whose binary representation consists of all zeros, except for the last 1 in both `x` and `-x`, so for `x=12`, the result is a binary `00000100`.

### Bit shifts

In school, one of the most delightful arithmetic operations is multiplication by ten, because there's no real work to do: simply append a zero and that's it. Same goes for multiplication by 100, 1000, and every other power of ten with a natural exponent. But astonishingly, this doesn't have to do anything with 10n ending with zeros, but with the fact that 10 is the base of our numeral system. Multiplying by 7 in base 7 works exactly the same way: append a zero. More generally, multiplying by `b` in base `b` works by appending a zero. If you don't already believe it, have a short look at how numeral systems are used to represent numbers. For the purpose of this HOWTO, it completely suffices to know that for binaries, multiplication by 2n is really the same as left-shifting by `n`. Actually, this is how left-shifting is formally defined:

```        x << n  :=  x * 2n
```

Of course, no sane chip designer will use multiplication in order to implement bit shifts on a new CPU, but this trick will help us determine the number of trailing zeros in a binary. Now, we are finally ready to have a look at the actual algorithm.

We know some simple facts about Two's Complement arithmetic, and we are mere experts on de Bruijn sequences, so let's see what we can do with it.

### The algorithm

The algorithm for determining the number of trailing zeros of a binary number `x` works as follows. First, we create an array which tells us for each possible bitstring of length 5 its offset in our de Bruijn sequence `DEBRUIJN`. That is, for a given 5-tuple t, how many times do we need to shift `DEBRUIJN` to the left in order to have t appear at the front. Then we set all bits of `x` to zero, except for the last 1. Let's call this number `n`. We multiply `n` with `DEBRUIJN`, but remember: the digit 1 appears exactly once in the binary representation of `n` (i.e. `n` is a power of two), so multiplying `DEBRUIJN` with `n` is really the same as shifting `DEBRUIJN` to the left as many times as there are trailing zeros in `n`. Now we take the first 5 digits of the shifted de Bruijn sequence and look it up in our array. This will tell us how many times we actually shifted `DEBRUIJN` to the left, i.e. the number of trailing zeros of `n`, which is the same as the number of trailing zeros of `x`, which is the solution to our original problem.

Here's the algorithm again:

1. Create array `table` for `DEBRUIJN`

2. `n = x & -x`

3. `shifted = DEBRUIJN * n`

4. `offset = table[shifted]`

5. return `offset`

### An Example

Looks easy enough, but let's have an example anyway: suppose we want to determine the number of trailing zeros in the binary representation of `x = 26,784`, which is `00000000 00000000 01101000 10100000`. Let's have a look at the de Bruijn sequence:

`DEBRUIJN = 00000111 01111100 10110101 00110001`

Note that I emphasized the substring `11101`. Now, let's first first isolate the last 1: `n = x & -x`. The binary representation of `n` is `00000000 00000000 00000000 00100000`. We shift the de Bruijn sequence by the number of trailing zeros in `n`, i.e. multiply `DEBRUIJN` with `n`: `shifted = DEBRUIJN * n`.

`shifted = 11101111 10010110 10100110 00100000`

We immediately right-shift the result by 27:

`shifted = 00000000 00000000 00000000 00011101`

Remember our table? It tells us for each bitstring of length 5 its offset in the de Bruijn sequence. For `11101`, it correctly reports 5, which is indeed the number of trailing zeros in `x`.

I'd love to claim that you don't need to understand this, but you need to. If it isn't already clear to you, read this chapter again. Maybe it helps to have a look at the previous chapter, too. Use a sheet of paper and try to reproduce the examples yourself. It's easy, go ahead! In the meantime, I'll continue with the actual source code.

### Populating the table

As mentioned in the previous section, we need an array which tells us for every possible binary 5-tuple the corresponding offset in a given de Bruijn sequence. Of course we might simply hardcode the array, and if you're happy with this solution, you can safely skip the current section and proceed to the next one without any further ado. For aesthetical completeness, this is what we have to do in order to populate the array `table` on the basis of a given de Bruijn sequence `debruijn`: we iterate over `i = 0..31` and for each step, we left-shift `debruijn` by `i` and immediately right-shift back by 27. This way, we retrieve all 5-tuples from our de Bruijn number, i.e. all numbers between 0 and 31, and use them as array subscripts to store the current value of `i` in `table`. When finished, `table` tells us the offset of each 5-tuple in `debruijn`.

Here's the code:

```void init(int table, unsigned int debruijn)
{
int i;

for(i = 0; i < 32; i++) {
table[debruijn >> 27] = i;
debruijn = debruijn << 1;
}
}
```

### Looking up the number of trailing zeros

Computing the number of trailing zeros for a given `int x` is now a mere child's play. Set all bits of `x` to zero, except for the last 1. Multiply the result by `debruijn` and right-shift by 27. Then use this value to look up the number of trailing zeros in the `table`.

```int ntz(int x)
{
size_t i = ((x & -x) * debruijn) >> 27;
return table[i];
}
```

Here, `debruijn` is a global variable, of course. If you don't like global variables, consider the function above to be pseudocode.

### Putting it all together

The following program is a complete implementation of the algorithm described above.

```#include <stdio.h>
#include <stdlib.h>

#define DEBRUIJN 0x077CB531UL

static int table;
static unsigned int debruijn;

static void init(int table, unsigned int db);
int ntz(int x);

int main(int argc, char *argv[])
{
int x;

if(argc != 2) {
fprintf(stderr, "USAGE: %s <num>\n", argv);
return EXIT_FAILURE;
}

x = atoi(argv);

/* initialize de Bruijn table */
init(table, DEBRUIJN);

printf("number of trailing zeros: %d\n", ntz(x));

return EXIT_SUCCESS;
}

/* compute number of trailing zeros */
int ntz(int x)
{
size_t i = ((x & -x) * debruijn) >> 27;
return table[i];
}

/* initialize table with de Bruijn sequence db */
static void init(int table, unsigned int db)
{
int i;

debruijn = db;

for(i = 0; i < 32; i++) {
table[db >> 27] = i;
db = db << 1;
}
}
```

In a real program, you probably wouldn't want to initialize the table at runtime, and you don't even need the `ntz()` function call. So, for the sake of completeness, here's another version which I have shamelessly taken over from Sean Eron Anderson. This version is highly tuned for efficiency, but at first sight it looks like voodoo. If you are bored at work, hand it out to your colleagues and see how long it will take them to figure out what it does:

```unsigned int v;  // find the number of trailing zeros in 32-bit v
int r;           // result goes here
static const int MultiplyDeBruijnBitPosition =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((v & -v) * 0x077CB531UL) >> 27];
```

### Caveats

Now you may think: "This was soo easy... now where's the catch?". And you are right, there are some aspects to be aware of when you implement the algorithm.

Most notably, not all de Bruijn sequences are equally suitable (aside from the fact that they need to be of proper size). In the beginning, I told you that we wrap to the start of the sequence once we've reached its end. But in the implementation, we only do left shifting, i.e. fill in with zeros. But that would yield widely inaccurate results if the de Bruijn sequence of length 2k doesn't start with k zeros, so make sure it does or simply stick to the one I supplied.

Furthermore, the implementation presented here makes possibly inaccurate assumptions about the number of bits in an `int`. Remember the 10th commandment from His prophet Henry Spencer: Thou shalt foreswear, renounce, and abjure the vile heresy which claimeth that ``All the world's a VAX'', and have no commerce with the benighted heathens who cling to this barbarous belief, that the days of thy program may be long even though the days of thy current machine be short. I sacrificied this wisdom for the sake of readability, but in order to avoid surprises, you better replace every occurrence of the number 32 (and 31 and 27 accordingly) with `CHAR_BITS*sizeof(int)`, which fortunately happens to be a compile-time constant.

On a sidenote, it has been suggested to add something along the lines of `if(x%2) return 0;` at the beginning of `ntz()`, based on the observation that odd numbers don't have trailing zeros in their binary representation. Separate from issues concerning branch prediction, this doesn't change anything in the average case: for odd numbers, you save two machine instructions, but for even numbers, you get two more.

 Except `INT_MIN`, because `abs(INT_MIN) == INT_MIN`.