[Cython] Hash-based vtables

Tue Jun 12 19:21:48 CEST 2012

On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote:
> On 06/10/2012 11:53 AM, Robert Bradshaw wrote:
>> On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn
>>> About signatures, a problem I see with following the C typing is that
>>> the
>>> signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and
>>> "iqq"
>>> on 32-bit Linux, and so on. I think that would be really bad.
>>
>> This is why I suggested promotion for scalars (divide ints into
>> <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at
>> C compile time, though I guess you consider that evil. I don't
>> consider not matching really bad, just kind of bad.
>
> Right. At least a convention for promotion of scalars would be good anyway.
>
> Even MSVC supports stdint.h these days; basing ourselves on the random
> behaviour of "long" seems a bit outdated to me. "ssize_t" would be
> better motivated I feel.
>
> Many linear algebra libraries use 32-bit matrix indices by default, but
> can be swapped to 64-bit indices (this holds for many LAPACK
> implementations and most sparse linear algebra). So often there will at
> least be one typedef that is either 32 bits or 64 bits without the
> Cython compiler knowing.
>
> Promoting to a single type "[u]int64" is the only one that removes
> possible combinatorial explosion if you have multiple external typedefs
> that you don't know the size of (although I guess that's rather
> theoretical).
>
> Anyway, runtime table generation is quite fast, see below.
>
>>
>>> "l" must be banished -- but then one might as well do "i4i8i8".
>>>
>>> Designing a signature hash where you select between these at
>>> compile-time is
>>> perhaps doable but does generate a lot of code and makes everything
>>> complicated.
>>
>> It especially gets messy when you're trying to pre-compute tables.
>>
>>> I think we should just start off with hashing at module load
>>> time when sizes are known, and then work with heuristics and/or build
>>> system
>>> integration to improve on that afterwards.
>>
>> Finding 10,000 optimal tables at runtime better be really cheap than
>> for Sage's sake :).
>
> The code is highly unpolished as I write this, but it works so here's
> some preliminary benchmarks.
>
> Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot
> table varies between 5 and 10 us (microseconds) depending on the set of
> hashes.
>
> Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns
> per hash, so 26 us for the 64-slot table => it dominates!
>
> The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table.
> Admittedly, that's with hand-written non-portable assembly for the
> crapwow64.
>
> Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4
> seconds for loading Sage using md5, or 0.1 seconds using crapwow64.
>
> https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h
>
> http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html

Look: A big advantage of the hash-vtables is that subclasses stay 
ABI-compatible with superclasses, and don't need recompilation when 
superclasses adds or removes methods.

=> Finding the hash table must happen at run-time in a lot of cases 
anyway, so I feel Robert's chase for a compile-time table building is moot.

I feel this would also need to trigger automatically heap-allocated 
tables if the statically allocated. Which is good to have in the very 
few cases where a perfect table can't be found too.

One thing is that, which makes me feel uneasy about the relatively 
unexplored crapwow64 is that we really don't want collisions in the 
64-bit prehashes within a single table (which would raise an exception 
-- which I think is OK from a security perspective, you can always have 
a MemoryError at any point too, so programmers should not expose class 
creation to attackers without being able to deal with it failing).

For the record, I found another md5 implementation that's a bit faster; 
first one is "sphlib" and second is "ulib":

In [2]: %timeit extensibletype.extensibletype.md5bench2(10**3)
1000 loops, best of 3: 237 us per loop

In [3]: %timeit extensibletype.extensibletype.md5bench(10**3)
1000 loops, best of 3: 374 us per loop

http://www.saphir2.com/sphlib/

It's really only for extremely large projects like Sage where this can 
be noticed in any way.

Dag