Hamming Distance

Jared Grubb jared.grubb at gmail.com
Fri Jun 27 19:54:45 CEST 2008


Matimus,
I was surprised that "lazy" was the algorithm that won your time tests, and
I saw a way to improve it even better (algorithm is O(# ones in number)
rather than O(# bits in number))
def lazy2(a, b, bits=32):
    x = (a ^ b) & ((1 << bits) - 1)
    tot = 0
    while x:
        tot += 1
        x &= x-1
    return tot

# times on my system (run a few times just to check for sure)
python -mtimeit -s"from ham import *" "test(lazy)"
10000 loops, best of 3: 121 usec per loop

python -mtimeit -s"from ham import *" "test(lazy2)"
10000 loops, best of 3: 62.4 usec per loop

Check my math, but I think that's correct. Here's my derivation (but it's
been a while since my Boolean algebra days, so I may've made a mistake!) It
sounds right, though, since subtracting one in two's complement flips the
rightmost one and inverts the zeros to the right of it. So, x & (x-1) would
remove the rightmost one. Right?

Long Derivation: The "trick" to finding the rightmost one in a number:
pos = x ^ (x-1)
It has to do with how two's-complement works. In our algorithm above, we are
trying to count them, so we want to flip off the bits one by one from the
right. So in each loop:
x = x & ~pos
But, then you notice you can simplify it even more (let y=x-1) and use
mult/add syntax for & and | and use X=~x and Y=~y
x * ~(x ^ y)
x * ~(xY+Xy)    [ def of ^ ]
x * (~(xY)*~(Xy)) [ DeMoires Law ]
x * ( (X+y)*(x+Y) ) [ inversion]
x * (X+y) * (x+Y)  [ associative]
(xX+xy)*(x+Y)  [ distributive ]
xy*(x+Y)  [ xX = 0 ]
xy+xyY   [ distrib ]
xy        [yY = 0]

So,
x &= x-1

On 19 Jun 2008, at 17:37, Matimus wrote:

On Jun 19, 4:27 pm, godavemon <davefow... at gmail.com> wrote:

I need to calculate the Hamming Distance of two integers.  The hamming

distance is the number of bits in two integers that don't match.  I

thought there'd be a function in math or scipy but i haven't been able

to find one.  This is my function but it seems like there should be a

faster way.  I do this computation many times and speed up is

important.


def hamdist( a, b , bits = 32):

    def _hamdist( x, bits):

        if bits:

            return (x & 1) + _hamdist(x >> 1, bits-1)

        return x & 1

    return _hamdist( a ^ b, bits)


Another alternative would be to convert the XOR to a binary string and

count the # of 1's.


Which would be fastest?  Are there better alternatives?


Thanks!


I see no good reason to use recursion for this type of thing. Here are
some of my attempts:

[code]
from math import log

def yours(a, b , bits = 32):
    def _hamdist( x, bits):
        if bits:
            return (x & 1) + _hamdist(x >> 1, bits-1)
        return x & 1
    return _hamdist(a ^ b, bits)


def simple(a, b, bits=32):
   x = a ^ b
   return sum((x >> i & 1) for i in xrange(bits))

def lazy(a, b, bits=32):
   x = (a ^ b) & ((1 << bits) - 1)
   tot = 0
   while x:
       tot += x & 1
       x >>= 1
   return tot

def fancy(a, b, bits=32):
   x = (a ^ b) & ((1 << bits) - 1)
   tot = 0
   while x:
       tot += 1
       x ^= 1 << int(log(x, 2))
   return tot

test_vals = (
       ((0xffffffff, 0), 32),
       ((0,0), 0),
       ((1,0), 1),
       ((0x80000000, 0), 1),
       ((0x55555555, 0), 16)
       )

def test(f):
   test_vals = (
           ((0xffffffff, 0), 32), # ALL
           ((0,0), 0), # None
           ((1,0), 1), # First
           ((0x80000000, 0), 1), # Last
           ((0x55555555, 0), 16), # Every Other
           ((0xffff, 0), 16), # First Half
           ((0xffff0000, 0), 16), # Last Half
           )
   for i, (args, exp) in enumerate(test_vals):
       if f(*args) != exp:
           return 0
   return 1

if __name__ == "__main__":
   for f in (yours, simple, lazy, fancy):
       if not test(f):
           print "%s failed"%f.__name__
[/code]

The python module `timeit` is handy for testing speed:

python -mtimeit -s"from hamdist import *" "test(yours)"
10000 loops, best of 3: 95.1 usec per loop

python -mtimeit -s"from hamdist import *" "test(simple)"
10000 loops, best of 3: 65.3 usec per loop

python -mtimeit -s"from hamdist import *" "test(lazy)"
10000 loops, best of 3: 59.8 usec per loop

python -mtimeit -s"from hamdist import *" "test(fancy)"
10000 loops, best of 3: 77.2 usec per loop

Even the ridiculous `fancy` version beat the recursive version.

Matt
--
http://mail.python.org/mailman/listinfo/python-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080627/5d78919e/attachment.html>


More information about the Python-list mailing list