[Cython] Hash-based vtables

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Thu Jun 7 12:47:39 CEST 2012


On 06/07/2012 12:45 PM, Dag Sverre Seljebotn wrote:
> On 06/06/2012 11:00 PM, Robert Bradshaw wrote:
>> On Tue, Jun 5, 2012 at 2:41 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> Is the goal then to avoid having to have an interning registry?
>>
>> Yes, and to avoid invoking an expensive hash function at runtime in
>> order to achieve good distribution.
>
> I don't understand. Compilation of call-sites would always generate a
> hash. You also need them while initializing/composing the hash table.
>
> But the storage and comparison of the hash rather than and interned
> string seems orthogonal to that.
>
> If it weren't for the security consern I agree with you. But I think
> Mark and Stefan makes a good point. Since you could hand a JIT-ed vtable
> (potentially the result of "trusted and verified user input") to a
> Cython function, *all* call-sites should use the full 160 bits.
>
> Interning solves this in a better way, and preserves vtable memory to boot.

No, it's not necesarrily *better* -- I meant, it's going to be faster 
than the 160 bit compare.

And I think throwing in a user option that anybody actually needs to 
care about would be a failure here.

Dag

>
> A collision registry would work against a security breach but still
> allow a DoS attack.
>
> Our dependencies are already:
>
> - md5
> - Pagh99 algorithm
>
> Why not throw in an interning registry as well ;-)
>
> But then the end-result is pretty cool.
>
>>> Something that hasn't come up so far is that Cython doesn't know the
>>> exact
>>> types of external typedefs, so it can't generate the hash at
>>> Cythonize-time.
>>> I guess some support for build systems to probe for type sizes and
>>> compute
>>> the signature hashes in a sepearate header file would solve this --
>>> with a
>>> fallback to computing them runtime at module loading, if you're not
>>> using a
>>> supported build system. (But suddenly an interning registry doesn't
>>> look so
>>> horrible..)
>>
>> It all depends on how strict you want to be. It may be acceptable to
>> let f(int) and f(long) not hash to the same value even if sizeof(int)
>> == sizeof(long). We could also promote all int types to long or long
>> long, including extern times (assuming, with a c-compile-time check,
>> external types declared up to "long" are<= sizeof(long)). Another
>
> Please no, I don't like any of those. We should not make the trouble
> with external typedefs worse than it already is. (Part of me wants to
> just declare that Cython is like Go with no implicit conversions to
> aovid inheriting the ugly coercion rules of C anyway...)
>
>> option is to let the hash be md5(sig) + hashN(sizeof(extern_arg1),
>> sizeof(extern_argN)) where hashN is a macro.
>
> Good idea. Would the following destroy all the nice properties of md5? I
> guess I wouldn't use it for crypto any longer...:
>
> hash("mymethod:iiZd") =
> md5("mymethod") ^ md5("i\x1") ^ md5("i\x2") ^ md5("Z\x3") ^ md5("d\x4")
>
> Dag



More information about the cython-devel mailing list