<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 12/28/16 12:27 PM, <a class="moz-txt-link-abbreviated" href="mailto:jab@math.brown.edu">jab@math.brown.edu</a> wrote:<br>

    <blockquote

cite="mid:CANwREeXNg__CPxzXBG6YB25CDnJHYSm_-ugOcMo6en-_9gLWkg@mail.gmail.com"

      type="cite">On Wed, Dec 28, 2016 at 11:48 AM, Ned Batchelder <span

        dir="ltr"><<a moz-do-not-send="true"

          href="mailto:ned@nedbatchelder.com" target="_blank">ned@nedbatchelder.com</a>></span>

      wrote:<br>

      <blockquote class="gmail_quote" style="margin:0px 0px 0px

        0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

        <div bgcolor="#FFFFFF">You can write a simple function to use

          hash iteratively to hash the entire stream in constant space

          and linear time:<br>

          <br>

              def hash_stream(them):<br>

                  val = 0<br>

                  for it in them:<br>

                      val = hash((val, it))<br>

                  return val<br>

          <br>

          Although this creates N 2-tuples, they come and go, so the

          memory use won't grow.  Adjust the code as needed to achieve

          canonicalization before iterating.<br>

          <br>

          Or maybe I am misunderstanding the requirements?</div>

      </blockquote>

      <div><br>

      </div>

      <div>This is better than solutions like <a moz-do-not-send="true"

          href="http://stackoverflow.com/a/27952689/161642"

          target="_blank">http://stackoverflow.com/a/<wbr>27952689/161642</a>

        in the sense that it's a little higher level (no bit shifting or

        magic numbers).</div>

      <div><br>

      </div>

      <div>But it's not clear that it's any better in the sense that

        you're still rolling your own incremental hash algorithm out of

        a lower-level primitive that doesn't document support for this,

        and therefore taking responsibility yourself for how well it

        distributes values into buckets.</div>

      <div><br>

      </div>

      <div>Are you confident this results in good hash performance? Is

        this better than a solution built on top of a hash function with

        an explicit API for calculating a hash incrementally, such as

        the crc32 example I included? (And again, this would ideally be

        a <span style="font-size:12.8px">sys.hash_info.hash_bits -bit

          algorithm.)</span></div>

    </blockquote>

    <br>

    I don't have the theoretical background to defend this function. 

    But it seems to me that if we believe that hash((int, thing))

    distributes well, then how could this function not distribute well?<br>

    <br>

    --Ned.<br>

  </body>

</html>