[Cython] Hash-based vtables

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Thu Jun 7 12:45:52 CEST 2012


On 06/06/2012 11:00 PM, Robert Bradshaw wrote:
> On Tue, Jun 5, 2012 at 2:41 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> Is the goal then to avoid having to have an interning registry?
>
> Yes, and to avoid invoking an expensive hash function at runtime in
> order to achieve good distribution.

I don't understand. Compilation of call-sites would always generate a 
hash. You also need them while initializing/composing the hash table.

But the storage and comparison of the hash rather than and interned 
string seems orthogonal to that.

If it weren't for the security consern I agree with you. But I think 
Mark and Stefan makes a good point. Since you could hand a JIT-ed vtable 
(potentially the result of "trusted and verified user input") to a 
Cython function, *all* call-sites should use the full 160 bits.

Interning solves this in a better way, and preserves vtable memory to boot.

A collision registry would work against a security breach but still 
allow a DoS attack.

Our dependencies are already:

  - md5
  - Pagh99 algorithm

Why not throw in an interning registry as well ;-)

But then the end-result is pretty cool.

>> Something that hasn't come up so far is that Cython doesn't know the exact
>> types of external typedefs, so it can't generate the hash at Cythonize-time.
>> I guess some support for build systems to probe for type sizes and compute
>> the signature hashes in a sepearate header file would solve this -- with a
>> fallback to computing them runtime at module loading, if you're not using a
>> supported build system. (But suddenly an interning registry doesn't look so
>> horrible..)
>
> It all depends on how strict you want to be. It may be acceptable to
> let f(int) and f(long) not hash to the same value even if sizeof(int)
> == sizeof(long). We could also promote all int types to long or long
> long, including extern times (assuming, with a c-compile-time check,
> external types declared up to "long" are<= sizeof(long)). Another

Please no, I don't like any of those. We should not make the trouble 
with external typedefs worse than it already is. (Part of me wants to 
just declare that Cython is like Go with no implicit conversions to 
aovid inheriting the ugly coercion rules of C anyway...)

> option is to let the hash be md5(sig) + hashN(sizeof(extern_arg1),
> sizeof(extern_argN)) where hashN is a macro.

Good idea. Would the following destroy all the nice properties of md5? I 
guess I wouldn't use it for crypto any longer...:

hash("mymethod:iiZd") =
md5("mymethod") ^ md5("i\x1") ^ md5("i\x2") ^ md5("Z\x3") ^ md5("d\x4")

Dag


More information about the cython-devel mailing list