[Cython] Hash-based vtables

mark florisson markflorisson88 at gmail.com
Tue Jun 5 20:02:04 CEST 2012


On 5 June 2012 18:09, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote:
>>
>> On 06/05/2012 09:25 AM, Stefan Behnel wrote:
>>>
>>> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>>>
>>>> This can cause crashes/stack smashes
>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>> probability is incredibly small, b) it would only matter in
>>>> situations that should cause an AttributeError anyway, c) if we
>>>> really care, we can always use an interning-like mechanism to
>>>> validate on module loading that its hashes doesn't collide with
>>>> other hashes (and raise an exception "Congratulations, you've
>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>> devs and we'll work around it right away").
>>>
>>>
>>> I'm not a big fan of such an attitude. If this happens at runtime, it can
>>> induce any cost from cheap-at-test-time to
>>> hugely-expensive-in-production.
>>> Thinking with my evil hat on, this can potentially be data triggered from
>>> the outside (e.g. if a JIT compiler is involved at one end), thus
>>> possibly
>>> even leading to a security hole.
>>>
>>> We should try to produce software that others can build a business on.
>>
>>
>> Well, I'd build a business on something that fails with a 5e-7
>> probability any day :-) (given that you trust my estimates in the other
>> post; I think they were rather conservative myself)
>
>
> This was put the wrong way. The chance was 5e-7 that it would fail for
> anybody over the course of human history (and that was a rather pessimistic
> estimate).
>
> So a more "individual tack":
>
> Assume that the process contains 200 MB of method definitions alone, with
> each method definition being a 8 character string. (That should mean the
> executable should be several gigabytes :-))
>
> That puts the probability of collision at 10^-34 for that process containing
> a 64-bit hash collision.
>
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

The point is not so much running into this problem accidentally, but
maliciously. If user input from untrusted users can somehow determine
the function signatures that are generated and called by a JIT, then a
malicious user can find collisions offline and cause some fault in a
valid user program.


More information about the cython-devel mailing list