[Cython] CEP1000: Native dispatch through callables

Sun Apr 15 11:08:43 CEST 2012

Nathaniel Smith <njs at pobox.com> wrote:

>On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn
><d.s.seljebotn at astro.uio.no> wrote:
>> Do you really think it complicates the spec? SHA-1 is pretty
>standard, and
>> Python ships with hashlib (the hashing part isn't performance
>critical).
>>
>> I prefer hashing to string-interning as it can still be done
>compile-time
>> etc. 160 bits isn't worse than the second-to-best strcmp case of a
>256-bit
>> function entry.
>
>If you're *so* set on compile-time calculation, one could also
>accommodate these within the intern framework pretty easily. Any
>PyString/PyBytes * will be aligned, which means the low bit will not
>be set, which means there are at least 2**31 bit-patterns that will
>never be used by a run-time interned string. So we could write down a
>lookup table in the spec that assigns arbitrary, well-known numbers to
>every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have
>15 standard types, then you can assign such an id to every 0, 1, 2, 3,
>4, 5, and 6 argument function with space left over.
>
>And this could all be abstracted away inside the intern() function.
>The only thing is that if you wanted to look at the characters in the
>interned string, you'd have to call a disintern() function instead of
>just following the pointer.
>
>I still think all this stuff would be complexity for its own sake,
>though.
>
>> Shortening the hash to 120 bits (truncation) we could have a spec
>like this:
>>
>>  - Short signature: [64 bit encoded signature. 64 bit funcptr]
>>  - Long signature: [64 bit hash, 64 bit pointer to full signature,
>>                    8 bit guard byte, 56 bits remaining hash,
>>                    64 bit funcptr]
>
>This is a fixed length encoding, so why does it need a guard byte?

No, there is two cases, one 128 bit and one 256 bit.

>
>BTW, the guard byte design in the last version of the CEP looks buggy
>to me -- there's no guarantee that a valid pointer might not contain
>the guard byte by accident. A solution would be to move the

In the CEP text some posts ago? I am pretty sure I made sure that pointers would never be looked at -- you are supposed to scan in 128 bit jumps and will never look at the beginning of a pointer. Read it again and see if you can make a counterexample...

That is the reason the above works, and why I split the hash in two segments.

>to-be-continued byte (or bit) to the first word. This would also mean
>that if you're looking for a one-word signature via switch(), you
>won't hit signatures which have your signature as a prefix. In the

You need 0-termination to be part of the signature (and if the 0 spills over, you spill over).

I should have said that, good catch.

Dag

>variable-length encoding with the lookup rule you suggested you'd also
>want a second bit to mark the actual beginning of each structure, so
>you don't get hits on the middle of structures.
>
>> Anyway: Looks like it's about time to do some benchmarks. I'll try to
>get
>> around to it next week.
>
> Agreed :-).
>
>- N
>_______________________________________________
>cython-devel mailing list
>cython-devel at python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.