Hamming Distance
godavemon
davefowler at gmail.com
Thu Jun 19 23:08:51 EDT 2008
Great thanks!
On Jun 19, 5:37 pm, Matimus <mccre... at gmail.com> wrote:
> On Jun 19, 4:27 pm, godavemon <davefow... at gmail.com> wrote:
>
>
>
> > I need to calculate the Hamming Distance of two integers. The hamming
> > distance is the number of bits in two integers that don't match. I
> > thought there'd be a function in math or scipy but i haven't been able
> > to find one. This is my function but it seems like there should be a
> > faster way. I do this computation many times and speed up is
> > important.
>
> > def hamdist( a, b , bits = 32):
> > def _hamdist( x, bits):
> > if bits:
> > return (x & 1) + _hamdist(x >> 1, bits-1)
> > return x & 1
> > return _hamdist( a ^ b, bits)
>
> > Another alternative would be to convert the XOR to a binary string and
> > count the # of 1's.
>
> > Which would be fastest? Are there better alternatives?
>
> > Thanks!
>
> I see no good reason to use recursion for this type of thing. Here are
> some of my attempts:
>
> [code]
> from math import log
>
> def yours(a, b , bits = 32):
> def _hamdist( x, bits):
> if bits:
> return (x & 1) + _hamdist(x >> 1, bits-1)
> return x & 1
> return _hamdist(a ^ b, bits)
>
> def simple(a, b, bits=32):
> x = a ^ b
> return sum((x >> i & 1) for i in xrange(bits))
>
> def lazy(a, b, bits=32):
> x = (a ^ b) & ((1 << bits) - 1)
> tot = 0
> while x:
> tot += x & 1
> x >>= 1
> return tot
>
> def fancy(a, b, bits=32):
> x = (a ^ b) & ((1 << bits) - 1)
> tot = 0
> while x:
> tot += 1
> x ^= 1 << int(log(x, 2))
> return tot
>
> test_vals = (
> ((0xffffffff, 0), 32),
> ((0,0), 0),
> ((1,0), 1),
> ((0x80000000, 0), 1),
> ((0x55555555, 0), 16)
> )
>
> def test(f):
> test_vals = (
> ((0xffffffff, 0), 32), # ALL
> ((0,0), 0), # None
> ((1,0), 1), # First
> ((0x80000000, 0), 1), # Last
> ((0x55555555, 0), 16), # Every Other
> ((0xffff, 0), 16), # First Half
> ((0xffff0000, 0), 16), # Last Half
> )
> for i, (args, exp) in enumerate(test_vals):
> if f(*args) != exp:
> return 0
> return 1
>
> if __name__ == "__main__":
> for f in (yours, simple, lazy, fancy):
> if not test(f):
> print "%s failed"%f.__name__
> [/code]
>
> The python module `timeit` is handy for testing speed:
>
> python -mtimeit -s"from hamdist import *" "test(yours)"
> 10000 loops, best of 3: 95.1 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(simple)"
> 10000 loops, best of 3: 65.3 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(lazy)"
> 10000 loops, best of 3: 59.8 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(fancy)"
> 10000 loops, best of 3: 77.2 usec per loop
>
> Even the ridiculous `fancy` version beat the recursive version.
>
> Matt
More information about the Python-list
mailing list