Finding the index of a 1 in a bitmap

Posted on 2015-04-21

I stumbled on a beautiful and very readable paper by Charles Leiserson and others on a nice technique to find the index of a 1 in a bit map (or a bit pattern). i.e. Suppose you have a bit pattern 00100100, a bit pattern with 8 bits, we would like to say that there is a 1 in the 2nd position and in the 5th position (from the right).

Before we proceed to the beautiful algorithm, I would suggest reading the paper itself rather than this blog post. The paper is very accessible and does not use any complicated math. I thought the algorithm ought to be more popular and should be part of any programmer’s toolset and hence this blog post. Plus it is a nice way to feed an otherwise starved brain of a software engineer like me whose life is extremely mundane. Well, I won’t get started on that.

So, some processors provide nice instructions to find the position of 1’s. Or something equivalent. I remember the wonderful TI C64x had single cycle instruction to find leading zeros in a 32-bit word. That’s kind of equivalent. Or one could find the leading 1’s position from that cheaply. This algorithm is for those processors which does not provide such instructions. Even if they do, it could be that the libraries don’t use them.

So, let us go straight to the algorithm steps through an example. The algorithm is named after the Dutch mathematician Nicolaas Govert de Bruijn, also known for many other great contributions. According to wikipedia, his last name sounds something like duh bruyn.

1. Given any bit pattern, orthogonalize them into patterns which contain only one 1.

For instance: 01101000 is 01000000 + 00100000 + 00001000.

How do we do that? There is a very nice and simple technique.

Let us say:

x = 01101000

then,

y = x & (-x) /* & implies bitwise AND operation. */

produces a bit pattern with only the left most 1 in the bit pattern in x, intact. In the above instance of x, that would be:

(-x) = 10011000

and so x and -x has a 1 in the same position only at one place – the place where they both have the right-most 1. Recall that you calculate -x from x by taking 2’s compliment. i.e. invert all the bits of x and add a 1 to it. One easy way to do that is to write down the bits from the right most position of x, till you find the first 1 and inverting all the bits left of the 1.

Now, we have made one component of the original bit pattern. We can subtract this component from x and repeat the above proces to find the other components, each bit patterns with only one 1 in them.

2. Given these components, we can ideally hash them to find the index corresponding to the 1.

i.e. if the number of bits is small and if we can find a hash function that does not produce collisions and if the hash function is easily computable, then, given the component calculated from step #1, y, h(y) directly gives us the bit position and we get the answer. We would also like to have the smallest hash table possible. If the bit map is n bits wide, then there are n possible positions a 1 can take, so ideally the size of the hash table should be n.

The question is, are there any easily computable hash functions that does not produce collisions.

3. And yes, there is one such function that can be computed from what is called de Bruijn sequence.

de Bruijn sequences of length n where n is a power of 2, is a cyclic sequences of bits such that every log₂(n) occurs exactly once as a contiguous substring of the sequence. Hmm, that is probably a bit confusing to some. So, let me show an example (straight from the paper but if you want more examples, look at “The art of computer programming, Vol 4A” by Don Knuth, page 302 - 303. I should be quoting section numbers instead of page numbers, but I don’t have the book handy at the moment and I very clearly remember the page numbers, since I referred to the sections on de Bruijn sequences a few times): Take the sequence of length 8: 00011101. Let us split this into 3 bit substrings (why 3 bits? Because log₂8 = 3): 000, 001, 011, 111, 110, 101 and now wrap around to get 010 and 100. We have all the possible 3-bit patterns occuring exactly once. Now, let us write them in the order of occurance in the sequence and record the order it occured in the sequence. i.e. 000 has an index of 0, 001 has an index of 1, 011 has an index of 2 .. and so on.

y index
000 0
001 1
011 2
111 3
110 4
101 5
010 6
100 7

We now compute the hash function as follows. The hash function h(x) is defined as:

h(x) = (x * DEBRUIJN) >> (n - log₂n)

where * denote multiplication and >> denote right shift. The multiplication is performed modulo 2ⁿ, i.e. After multiplication of two n-bit numbers which produce 2*n bits, we throw away the higher order n bits. The constant denoted by the term DEBRUIJN above is the n-bit de Bruijn sequence which starts with log₂n 0’s. We described one such sequence above, which is 00011101.

So, to recap, we are in our quest for a hash function, whose input is an n-bit length bit pattern which has only one of its bits set to 1 and hash function should give us the index of the bit set to 1.

According to the above statement, h(x) should give us that value and if we lookup the h(x) as the input y in the above table, we should get the index. Let us work out an example:

And lo and behold, our input bit pattern x had its 4th bit set.

What are we doing here? x can have any of its bit positions (x is an n bit wide word) set. Multiplying x by DEBRUIJN would cause DEBRUIJN to be shifted up (i.e. left) by i posisions, where i is the position at which a bit is set in the input x. If we now have a small sliding window where the window has only 3 bit positions open and is at the right-most part, then we can one of the patterns in the table which is the so called de Bruijn index and we then map them into a normal index via the table.

efficiency

The index calculation needs:

The algorithm can be trivially implemented using any programming language.

I think this is one of the coolest bit manipulation algorithm I have seen in recent times and that it is worth sharing. Hope you like it!