<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 12/28/16 12:27 PM, <a class="moz-txt-link-abbreviated" href="mailto:jab@math.brown.edu">jab@math.brown.edu</a> wrote:<br>
<blockquote
cite="mid:CANwREeXNg__CPxzXBG6YB25CDnJHYSm_-ugOcMo6en-_9gLWkg@mail.gmail.com"
type="cite">On Wed, Dec 28, 2016 at 11:48 AM, Ned Batchelder <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:ned@nedbatchelder.com" target="_blank">ned@nedbatchelder.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">You can write a simple function to use
hash iteratively to hash the entire stream in constant space
and linear time:<br>
<br>
def hash_stream(them):<br>
val = 0<br>
for it in them:<br>
val = hash((val, it))<br>
return val<br>
<br>
Although this creates N 2-tuples, they come and go, so the
memory use won't grow. Adjust the code as needed to achieve
canonicalization before iterating.<br>
<br>
Or maybe I am misunderstanding the requirements?</div>
</blockquote>
<div><br>
</div>
<div>This is better than solutions like <a moz-do-not-send="true"
href="http://stackoverflow.com/a/27952689/161642"
target="_blank">http://stackoverflow.com/a/<wbr>27952689/161642</a>
in the sense that it's a little higher level (no bit shifting or
magic numbers).</div>
<div><br>
</div>
<div>But it's not clear that it's any better in the sense that
you're still rolling your own incremental hash algorithm out of
a lower-level primitive that doesn't document support for this,
and therefore taking responsibility yourself for how well it
distributes values into buckets.</div>
<div><br>
</div>
<div>Are you confident this results in good hash performance? Is
this better than a solution built on top of a hash function with
an explicit API for calculating a hash incrementally, such as
the crc32 example I included? (And again, this would ideally be
a <span style="font-size:12.8px">sys.hash_info.hash_bits -bit
algorithm.)</span></div>
</blockquote>
<br>
I don't have the theoretical background to defend this function.
But it seems to me that if we believe that hash((int, thing))
distributes well, then how could this function not distribute well?<br>
<br>
--Ned.<br>
</body>
</html>