<html><head></head><body>Ah, I didn't think about 6-bit or huffman. Certainly helps.<br>
<br>
I'm almost +1 on your proposal now, but a couple of more ideas:<br>
<br>
1) Let the key (the size_t) spill over to the next specialization entry if it is too large; and prepend that key with a continuation code (two size-ts could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using - as continuation). The key-based caller will expect a continuation if it knows about the specialization, and the prepended char will prevent spurios matches against the overspilled slot.<br>
<br>
We could even use the pointers for part of the continuation...<br>
<br>
2) Separate the char* format strings from the keys, ie this memory layout:<br>
<br>
Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr...<br>
<br>
Where nslots is larger than nspecs if there are continuations.<br>
<br>
OK, this is getting close to my original proposal, but the difference is the contiunation char, so that if you expect a short signature, you can safely scan every slot and branching and no null-checking necesarry.<br>
<br>
Dag<br>
-- <br>
Sent from my Android phone with K-9 Mail. Please excuse my brevity.<br><br><div class="gmail_quote">Robert Bradshaw <robertwb@gmail.com> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<pre style="white-space: pre-wrap; word-wrap:break-word; font-family: sans-serif">On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn<br /><d.s.seljebotn@astro.uio.no> wrote:<br />> On 04/13/2012 01:38 PM, Stefan Behnel wrote:<br />>><br />>> Robert Bradshaw, 13.04.2012 12:17:<br />>>><br />>>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote:<br />>>>><br />>>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote:<br />>>>>><br />>>>>> Have you given any thought as to what happens if __call__ is<br />>>>>> re-assigned for an object (or subclass of an object) supporting this<br />>>>>> interface? Or is this out of scope?<br />>>>><br />>>>><br />>>>> Out-of-scope, I'd say. Though you can always write an object that<br />>>>> detects if<br />>>>> you assign to __call__...<br />>><br
/>>><br />>> +1 for out of scope. This is a pure C level feature.<br />>><br />>><br />>>>>> Minor nit: I don't think should_dereference is worth branching on, if<br />>>>>> one wants to save the allocation one can still use a variable-sized<br />>>>>> type and point to oneself. Yes, that's an extra dereference, but the<br />>>>>> memory is already likely close and it greatly simplifies the logic.<br />>>>>> But I could be wrong here.<br />>>>><br />>>>><br />>>>><br />>>>> Those minor nits are exactly what I seek; since Travis will have the<br />>>>> first<br />>>>> implementation in numba<->SciPy, I just want to make sure that what he<br />>>>> does<br />>>>> will work efficiently work Cython.<br />>>><br />>>><br />>>> +1<br />>>><br
/>>>> I have to admit building/invoking these var-arg-sized __nativecall__<br />>>> records seems painful. Here's another suggestion:<br />>>><br />>>> struct {<br />>>> void* pointer;<br />>>> size_t signature; // compressed binary representation, 95% coverage<br />><br />> Once you start passing around functions that take memory view slices as<br />> arguments, that 95% estimate will be off I think.<br /><br />We have (on the high-performance systems we care about) 64-bits here.<br />If we limit ourselves to a 6-bit alphabet, that gives a trivial<br />encoding for up to 10 chars. We could be more clever here (Huffman<br />coding) but that might be overkill. More importantly though, the<br />"complicated" signatures are likely to be so cheap that the strcmp<br />overhead matters.<br /><br />>>> char* long_signature; // used if signature is not representable in<br />>>> a size_t, as
indicated by signature = 0<br />>>> } record;<br />>>><br />>>> These char* could optionally be allocated at the end of the record*<br />>>> for optimal locality. We could even dispense with the binary<br />>>> signature, but having that option allows us to avoid strcmp for stuff<br />>>> like d)d and ffi)f.<br />>><br />>><br />>> Assuming we use literals and a const char* for the signature, the C<br />>> compiler would cut down the number of signature strings automatically for<br />>> us. And a pointer comparison is the same as a size_t comparison.<br />><br />><br />> I'll go one further: Intern Python bytes objects. It's just a PyObject*, but<br />> it's *required* (or just strongly encouraged) to have gone through<br />><br />> sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig)<br />><br />> Obviously in a PEP you'd have a C-API function for such
interning<br />> (completely standalone utility). Performance of interning operation itself<br />> doesn't matter...<br />><br />> Unless CPython has interning features itself, like in Java? Was that present<br />> back in the day and then ripped out?<br />><br />> Requiring interning is somewhat less elegant in one way, but it makes a lot<br />> of other stuff much simpler.<br />><br />> That gives us<br />><br />> struct {<br />> void *pointer;<br />> PyBytesObject *signature;<br />> } record;<br />><br />> and then you allocate a NULL-terminated arrays of these for all the<br />> overloads.<br /><br />Global interning is a nice idea. The one drawback I see is that it<br />becomes much more expensive for dynamically calculated signatures.<br /><br />>><br />>> That would only apply at a per-module level, though, so it would require<br />>> an<br />>> indirection for the signature IDs. But it
would avoid a global registry.<br />>><br />>> Another idea would be to set the signature ID field to 0 at the beginning<br />>> and call a C-API function to let the current runtime assign an ID> 0,<br />>> unique for the currently running application. Then every user would only<br />>> have to parse the signature once to adapt to the respective ID and could<br />>> otherwise branch based on it directly.<br />>><br />>> For Cython, we could generate a static ID variable for each typed call<br />>> that<br />>> we found in the sources. When encountering a C signature on a callable,<br />>> either a) the ID variable is still empty (initial case), then we parse the<br />>> signature to see if it matches the expected signature. If it does, we<br />>> assign the corresponding ID to the static ID variable and issue a direct<br />>> call. If b) the ID field is already set (normal case), we compare
the<br />>> signature IDs directly and issue a C call it they match. If the IDs do not<br />>> match, we issue a normal Python call.<br /><br />If I understand correctly, you're proposing<br /><br />struct {<br /> char* sig;<br /> long id;<br />} sig_t;<br /><br />Where comparison would (sometimes?) compute id from sig by augmenting<br />a global counter and dict? Might be expensive to bootstrap, but<br />eventually all relevant ids would be filled in and it would be quick.<br />Interesting. I wonder what the performance penalty would be over<br />assuming id is statically computed lots of the time, and using that to<br />compare against fixed values. And there's memory locality issues as<br />well.<br /><br />>>>> Right... if we do some work to synchronize the types for Cython modules<br />>>>> generated by the same version of Cython, we're left with 3-4 types for<br />>>>> Cython, right? Then a couple for numba and one for f2py; so
on the order<br />>>>> of<br />>>>> 10?<br />>>><br />>>><br />>>> No, I think each closure is its own type.<br />>><br />>><br />>> And that even applies to fused functions, right? They'd have one closure<br />>> for each type combination.<br />>><br />>><br />>>>> An alternative is do something funny in the type object to get across<br />>>>> the<br />>>>> offset-in-object information (abusing the docstring, or introduce our<br />>>>> own<br />>>>> flag which means that the type object has an additional non-standard<br />>>>> field<br />>>>> at the end).<br />>>><br />>>><br />>>> It's a hack, but the flag + non-standard field idea might just work...<br />>><br />>><br />>> Plus, it wouldn't have to stay a non-standard field. If it's accepted into<br />>> CPython
3.4, we could safely use it in all existing versions of CPython.<br />><br />><br />> Sounds good. Perhaps just find a single "extended", then add a new flag<br />> field in our payload, in case we need to extend the types object yet again<br />> later and run out of unused flag bits (TBD: figure out how many unused flag<br />> bits there are).<br />><br />> Dag<br />><br />><hr /><br />> cython-devel mailing list<br />> cython-devel@python.org<br />> <a href="http://mail.python.org/mailman/listinfo/cython-devel">http://mail.python.org/mailman/listinfo/cython-devel</a><br /><hr /><br />cython-devel mailing list<br />cython-devel@python.org<br /><a href="http://mail.python.org/mailman/listinfo/cython-devel">http://mail.python.org/mailman/listinfo/cython-devel</a><br /></pre></blockquote></div></body></html>