[Cython] CEP1000: Native dispatch through callables
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Sat Apr 14 00:22:15 CEST 2012
Robert Bradshaw <robertwb at gmail.com> wrote:
>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> Ah, I didn't think about 6-bit or huffman. Certainly helps.
>>> I'm almost +1 on your proposal now, but a couple of more ideas:
>>> 1) Let the key (the size_t) spill over to the next specialization
>>> it is too large; and prepend that key with a continuation code (two
>>> could together say "iii)-d\0\0" on 32 bit systems with 8bit
>>> - as continuation). The key-based caller will expect a continuation
>>> knows about the specialization, and the prepended char will prevent
>>> matches against the overspilled slot.
>>> We could even use the pointers for part of the continuation...
>> I am really lost here. Why is any of this complicated encoding stuff
>> better than interning? Interning takes one line of code, is
>> cheap (one dict lookup per call site and function definition), and it
>> lets you check any possible signature (even complicated ones
>> memoryviews) by doing a single-word comparison. And best of all, you
>> don't have to think hard to make sure you got the encoding right. ;-)
>> On a 32-bit system, pointers are smaller than a size_t, but more
>> expressive! You can still do binary search if you want, etc. Is the
>> problem just that interning requires a runtime calculation? Because I
>> feel like C users (like numpy) will want to compute these compressed
>> codes at module-init anyway, and those of us with a fancy compiler
>> capable of computing them ahead of time (like Cython) can instruct
>> that fancy compiler to compute them at module-init time just as
>The primary disadvantage of interning that I see is memory locality. I
>suppose if all the C-level caches of interned values were co-located,
>this may not be as big of an issue. Not being able to compare against
>compile-time constants may thwart some optimization opportunities, but
>that's less clear.
>It also requires coordination common repository, but I suppose one
>would just stick a set in some standard module (or leverage Python's
1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python and we should not make it worse.
You basically *need* a thread safe store separate from any python interpreter; though pythread.h does not rely on the interpreter state; which helps.
2) you end up with the known comparison values in read-write memory segments rather than readonly segments, which is probably worse on multicore systems?
I really think that anything that we can do to make this near-c-speed should be done; none of the proposals are *that* complicated.
Using keys, NumPy can in the C code choose to be slower but more readable; but using interned string forces cython to be slower, cython gets no way of choosing to go faster. (to the degree that it has an effect; none of these claims were checked)
>cython-devel mailing list
>cython-devel at python.org
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
More information about the cython-devel