[Cython] Hash-based vtables

mark florisson markflorisson88 at gmail.com
Tue Jun 5 22:33:12 CEST 2012


On 5 June 2012 20:33, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 06/05/2012 08:02 PM, mark florisson wrote:
>>
>> On 5 June 2012 18:09, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>>  wrote:
>>>
>>> On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>>
>>>> On 06/05/2012 09:25 AM, Stefan Behnel wrote:
>>>>>
>>>>>
>>>>> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>>>>>
>>>>>>
>>>>>> This can cause crashes/stack smashes
>>>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>>>> probability is incredibly small, b) it would only matter in
>>>>>> situations that should cause an AttributeError anyway, c) if we
>>>>>> really care, we can always use an interning-like mechanism to
>>>>>> validate on module loading that its hashes doesn't collide with
>>>>>> other hashes (and raise an exception "Congratulations, you've
>>>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>>>> devs and we'll work around it right away").
>>>>>
>>>>>
>>>>>
>>>>> I'm not a big fan of such an attitude. If this happens at runtime, it
>>>>> can
>>>>> induce any cost from cheap-at-test-time to
>>>>> hugely-expensive-in-production.
>>>>> Thinking with my evil hat on, this can potentially be data triggered
>>>>> from
>>>>> the outside (e.g. if a JIT compiler is involved at one end), thus
>>>>> possibly
>>>>> even leading to a security hole.
>>>>>
>>>>> We should try to produce software that others can build a business on.
>>>>
>>>>
>>>>
>>>> Well, I'd build a business on something that fails with a 5e-7
>>>> probability any day :-) (given that you trust my estimates in the other
>>>> post; I think they were rather conservative myself)
>>>
>>>
>>>
>>> This was put the wrong way. The chance was 5e-7 that it would fail for
>>> anybody over the course of human history (and that was a rather
>>> pessimistic
>>> estimate).
>>>
>>> So a more "individual tack":
>>>
>>> Assume that the process contains 200 MB of method definitions alone, with
>>> each method definition being a 8 character string. (That should mean the
>>> executable should be several gigabytes :-))
>>>
>>> That puts the probability of collision at 10^-34 for that process
>>> containing
>>> a 64-bit hash collision.
>>>
>>>
>>> Dag
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>>
>> The point is not so much running into this problem accidentally, but
>> maliciously. If user input from untrusted users can somehow determine
>> the function signatures that are generated and called by a JIT, then a
>> malicious user can find collisions offline and cause some fault in a
>> valid user program.
>
>
> This took me a while to understand. So the idea is that you're in a
> completely managed environment (like Java), and you want to run untrusted
> code and have it not segfault or smash the stack. Eve then cleverly
> assembles a caller/callee pair with mismatching signatures but the same
> hash.
>
> Yes, in that situation 64 bits is perhaps not enough.
>
> But is this relevant to what we're trying to do here? We're discussing APIs
> to talk between Python C extension modules that already have unlimited
> powers. I'd think a "managed Cython" would be such a large change that one
> could easily change the hash size at that point?
>
> But I agree it's not as easily written off as I thought.
>
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

It doesn't even necessarily have to be about running user code, a user
could craft data input which causes such a situation. For instance,
let's say we have a just-in-time specializer which specializes a
function for the runtime input types, and the types depend on the user
input. For instance, if we write a web application we can post arrays
to described by a custom dtype, which draws pictures in some weird way
for us. We can get it to specialize pretty much any array type, so
that gives us a good opportunity to find collisions.


More information about the cython-devel mailing list