[Cython] CEP1000: Native dispatch through callables

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Sat Apr 14 00:06:47 CEST 2012



Robert Bradshaw <robertwb at gmail.com> wrote:

>On Fri, Apr 13, 2012 at 1:27 PM, Dag Sverre Seljebotn
><d.s.seljebotn at astro.uio.no> wrote:
>> Ah, I didn't think about 6-bit or huffman. Certainly helps.
>
>Yeah, we don't want to complicate the ABI too much, but I think
>something like 8 4-bit common chars and 32 6-bit other chars (or 128
>8-bit other chars) wouldn't be outrageous. The fact that we only have
>to encode into a single word makes the algorithm very simple (though
>the majority of the time we'd spit out pre-encoded literals). We have
>a version number to play with this as well.
>
>> I'm almost +1 on your proposal now, but a couple of more ideas:
>>
>> 1) Let the key (the size_t) spill over to the next specialization
>entry if
>> it is too large; and prepend that key with a continuation code (two
>size-ts
>> could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding,
>using
>> - as continuation). The key-based caller will expect a continuation
>if it
>> knows about the specialization, and the prepended char will prevent
>spurios
>> matches against the overspilled slot.
>>
>> We could even use the pointers for part of the continuation...
>>
>> 2) Separate the char* format strings from the keys, ie this memory
>layout:
>>
>>
>Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr...
>>
>> Where nslots is larger than nspecs if there are continuations.
>>
>> OK, this is getting close to my original proposal, but the difference
>is the
>> contiunation char, so that if you expect a short signature, you can
>safely
>> scan every slot and branching and no null-checking necesarry.
>
>I don't think we need nslots (though it might be interesting). My
>thought is that once you start futzing with variable-length keys, you
>might as well just compare char*s.

This is where we disagree. If you are the caller you know at compile-time how much you want to match; I think comparing 2 or 3 size-t with no looping is a lot better (a fully-unrolled, 64-bit per instruction strcmp with one of the operands known to the compiler...).


>
>If one is concerned about memory, one could force the sigcharptr to be
>aligned, and then the "keys" could be either sigcharptr or key
>depending on whether the least significant bit was set. One could
>easily scan for/switch on a key and scanning for a char* would be
>almost as easy (just don't dereference if the lsb is set).
>
>I don't see us being memory constrained, so
>
>(version,nspecs,futureuse),(key,sigcharptr,funcptr)*,optionalsigchardata*
>
>seems fine to me even if only one of key/sigchrptr is ever used per
>spec. Null-terminating the specs would work fine as well (one less
>thing to keep track of during iteration).

Well, can't one always use more L1 cache, or is that not a concern? If you have 5-6 different routines calling each other using this mechanism, each with multiple specializations, those unused slots translate to many cache lines wasted.

I don't think it is that important, I just think that how pretty the C struct declaration ends up looking should not be a concern at all, when the whole point of this is speed anyway. You can always just use a throwaway struct declaration and a cast to get whatever layout you need. If the 'padding' leads to less branching then fine, but I don't see that it helps in any way.

To refine my proposal a bit, we have a list of variable size entries,

(keydata, keydata, ..., funcptr)

where each keydata and the ptr is 64 bits on all platforms (see below); each entry must have a total length multiple of 128 bits (so that one can safely scan for a signature in 128 bit increments in the data *without* parsing or branching, you'll never hit a pointer), and each key but the first starts with a 'dash'.

Signature strings are either kept separate, or even parsed/decoded from the keys. We really only care about speed when you have compiled or JITed code for the case, decoding should be fine otherwise. 

BTW, won't the Cython-generated C code be a horrible mess if we use size-t rather than insist on int64t? (ok, those need some ifdefs for various compilers, but still seem cleaner than operating with 32bit and 64bit keys, and stdint.h is winning ground).

Dag



>
>- Robert
>_______________________________________________
>cython-devel mailing list
>cython-devel at python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.


More information about the cython-devel mailing list