<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
If the Python community is just noticing the Hsieh hash, that implies
that the Bob Jenkins hashes are probably unknown as well. Behold:<br>
<a class="moz-txt-link-freetext" href="http://burtleburtle.net/bob/hash/doobs.html">http://burtleburtle.net/bob/hash/doobs.html</a><br>
To save you a little head-scratching, the functions you want to play
with are hashlittle()/hashlittle2() in "lookup3.c":<br>
<a class="moz-txt-link-freetext" href="http://burtleburtle.net/bob/c/lookup3.c">http://burtleburtle.net/bob/c/lookup3.c</a><br>
hashlittle() returns a 32-bit hash; hashlittle2() returns two 32-bit
hashes on the same input (in effect a 64-bit hash). The "little"
implies that the function is better on little-endian machines. (There
is a hashbig(); no hashbig2(), it is left as an exercise for the
reader.)<br>
<br>
In our testing (at Facebook, for memcached) hashlittle2 was faster than
the Hsieh hash; that was done a year ago (and before I joined) so I
don't have numbers for you.<br>
<br>
One goal of Jenkin's hashes is uniform distribution, so these functions
presumably lack the serendipitous "similar inputs hash to similar
values" behavior of Python's current hash function. But why is that a
feature? (Not that I doubt Tim Peters!)<br>
<br>
Oh, and, all the Jenkins code is public domain. <br>
<br>
Cheers,<br>
<br>
<br>
<i>larry</i><br>
</body>
</html>