[Python-Dev] Py_ssize_t

Tue Feb 20 19:28:17 CET 2007

[Raymond Hettinger]
> After thinking more about Py_ssize_t, I'm surprised that we're not hearing about
> 64 bit users having a couple of major problems.
>
> If I'm understanding what was done for dictionaries, the hash table can grow
> larger than the range of hash values.  Accordingly, I would expect large
> dictionaries to have an unacceptably large number of collisions.  OTOH, we
> haven't heard a single complaint, so perhaps my understanding is off.
> ...

As others have noted, it would require a truly gigantic dict for
anyone to notice, and nobody yet has enough RAM to build something
that large.  I added this comment to dictobject.c for 2.5:

    Theoretical Python 2.5 headache:  hash codes are only C "long", but
    sizeof(Py_ssize_t) > sizeof(long) may be possible.  In that case, and if a
    dict is genuinely huge, then only the slots directly reachable via indexing
    by a C long can be the first slot in a probe sequence.  The probe sequence
    will still eventually reach every slot in the table, but the collision rate
    on initial probes may be much higher than this scheme was designed for.
    Getting a hash code as fat as Py_ssize_t is the only real cure.  But in
    practice, this probably won't make a lick of difference for many years (at
    which point everyone will have terabytes of RAM on 64-bit boxes).

Ironically, IIRC we /have/ had a complaint in the other direction:
someone on SF claims to have a box where sizeof(Py_ssize_t) <
sizeof(long).  Something else breaks as a result of that.  I think I
always implicitly assumed sizeof(Py_ssize_t) >= sizeof(long) would
hold.

In any case, hash codes are defined to be of type "long" in the C API,
so there appears no painless way to boost their size on boxes where
sizeof(Py_ssize_t) > sizeof(long).