[Cython] CEP1000: Native dispatch through callables

Nathaniel Smith njs at pobox.com
Sun Apr 15 10:48:37 CEST 2012


On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> Do you really think it complicates the spec? SHA-1 is pretty standard, and
> Python ships with hashlib (the hashing part isn't performance critical).
>
> I prefer hashing to string-interning as it can still be done compile-time
> etc. 160 bits isn't worse than the second-to-best strcmp case of a 256-bit
> function entry.

If you're *so* set on compile-time calculation, one could also
accommodate these within the intern framework pretty easily. Any
PyString/PyBytes * will be aligned, which means the low bit will not
be set, which means there are at least 2**31 bit-patterns that will
never be used by a run-time interned string. So we could write down a
lookup table in the spec that assigns arbitrary, well-known numbers to
every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have
15 standard types, then you can assign such an id to every 0, 1, 2, 3,
4, 5, and 6 argument function with space left over.

And this could all be abstracted away inside the intern() function.
The only thing is that if you wanted to look at the characters in the
interned string, you'd have to call a disintern() function instead of
just following the pointer.

I still think all this stuff would be complexity for its own sake, though.

> Shortening the hash to 120 bits (truncation) we could have a spec like this:
>
>  - Short signature: [64 bit encoded signature. 64 bit funcptr]
>  - Long signature: [64 bit hash, 64 bit pointer to full signature,
>                    8 bit guard byte, 56 bits remaining hash,
>                    64 bit funcptr]

This is a fixed length encoding, so why does it need a guard byte?

BTW, the guard byte design in the last version of the CEP looks buggy
to me -- there's no guarantee that a valid pointer might not contain
the guard byte by accident. A solution would be to move the
to-be-continued byte (or bit) to the first word. This would also mean
that if you're looking for a one-word signature via switch(), you
won't hit signatures which have your signature as a prefix. In the
variable-length encoding with the lookup rule you suggested you'd also
want a second bit to mark the actual beginning of each structure, so
you don't get hits on the middle of structures.

> Anyway: Looks like it's about time to do some benchmarks. I'll try to get
> around to it next week.

 Agreed :-).

- N


More information about the cython-devel mailing list