<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 12/27/16 10:13 PM, <a class="moz-txt-link-abbreviated" href="mailto:jab@math.brown.edu">jab@math.brown.edu</a> wrote:<br>
<blockquote
cite="mid:CANwREeWNHSwmnmTA6DKpcgSY7sobG08F+oQefCu9ox0eroJFsQ@mail.gmail.com"
type="cite">
<pre wrap="">Applying this advice to the use cases above would require creating an
arbitrarily large tuple in memory before passing it to hash(), which
is then just thrown away. It would be preferable if there were a way
to pass multiple values to hash() in a streaming fashion, such that
the overall hash were computed incrementally, without building up a
large object in memory first.
Should there be better support for this use case? Perhaps hash() could
support an alternative signature, allowing it to accept a stream of
values whose combined hash would be computed incrementally in
<b class="moz-txt-star"><span class="moz-txt-tag">*</span>constant<span class="moz-txt-tag">*</span></b> space and linear time, e.g. "hash(items=iter(self))".
</pre>
</blockquote>
You can write a simple function to use hash iteratively to hash the
entire stream in constant space and linear time:<br>
<br>
def hash_stream(them):<br>
val = 0<br>
for it in them:<br>
val = hash((val, it))<br>
return val<br>
<br>
Although this creates N 2-tuples, they come and go, so the memory
use won't grow. Adjust the code as needed to achieve
canonicalization before iterating.<br>
<br>
Or maybe I am misunderstanding the requirements?<br>
<br>
--Ned.<br>
</body>
</html>