[Cython] Hash-based vtables
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Tue Jun 5 21:33:16 CEST 2012
On 06/05/2012 08:02 PM, mark florisson wrote:
> On 5 June 2012 18:09, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no> wrote:
>> On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 06/05/2012 09:25 AM, Stefan Behnel wrote:
>>>>
>>>> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>>>>
>>>>> This can cause crashes/stack smashes
>>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>>> probability is incredibly small, b) it would only matter in
>>>>> situations that should cause an AttributeError anyway, c) if we
>>>>> really care, we can always use an interning-like mechanism to
>>>>> validate on module loading that its hashes doesn't collide with
>>>>> other hashes (and raise an exception "Congratulations, you've
>>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>>> devs and we'll work around it right away").
>>>>
>>>>
>>>> I'm not a big fan of such an attitude. If this happens at runtime, it can
>>>> induce any cost from cheap-at-test-time to
>>>> hugely-expensive-in-production.
>>>> Thinking with my evil hat on, this can potentially be data triggered from
>>>> the outside (e.g. if a JIT compiler is involved at one end), thus
>>>> possibly
>>>> even leading to a security hole.
>>>>
>>>> We should try to produce software that others can build a business on.
>>>
>>>
>>> Well, I'd build a business on something that fails with a 5e-7
>>> probability any day :-) (given that you trust my estimates in the other
>>> post; I think they were rather conservative myself)
>>
>>
>> This was put the wrong way. The chance was 5e-7 that it would fail for
>> anybody over the course of human history (and that was a rather pessimistic
>> estimate).
>>
>> So a more "individual tack":
>>
>> Assume that the process contains 200 MB of method definitions alone, with
>> each method definition being a 8 character string. (That should mean the
>> executable should be several gigabytes :-))
>>
>> That puts the probability of collision at 10^-34 for that process containing
>> a 64-bit hash collision.
>>
>>
>> Dag
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>
> The point is not so much running into this problem accidentally, but
> maliciously. If user input from untrusted users can somehow determine
> the function signatures that are generated and called by a JIT, then a
> malicious user can find collisions offline and cause some fault in a
> valid user program.
This took me a while to understand. So the idea is that you're in a
completely managed environment (like Java), and you want to run
untrusted code and have it not segfault or smash the stack. Eve then
cleverly assembles a caller/callee pair with mismatching signatures but
the same hash.
Yes, in that situation 64 bits is perhaps not enough.
But is this relevant to what we're trying to do here? We're discussing
APIs to talk between Python C extension modules that already have
unlimited powers. I'd think a "managed Cython" would be such a large
change that one could easily change the hash size at that point?
But I agree it's not as easily written off as I thought.
Dag
More information about the cython-devel
mailing list