From d.s.seljebotn at astro.uio.no  Fri Jun  1 15:49:21 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 01 Jun 2012 15:49:21 +0200
Subject: [Cython] SEP 201 draft: Native callable objects
In-Reply-To: <CADiQ+QDH0GD51wSMSY+ZS-b3y=dyts1=-H31D6oHqNHarMrP1A@mail.gmail.com>
References: <4FC77A5C.50009@astro.uio.no>
	<CADiQ+QDdOaTSzHd+4eGdLBNRVp--j6V4-+F53W7OidO-6SOOkw@mail.gmail.com>
	<4FC7C6A4.3060404@astro.uio.no>
	<CADiQ+QDH0GD51wSMSY+ZS-b3y=dyts1=-H31D6oHqNHarMrP1A@mail.gmail.com>
Message-ID: <4FC8C861.5040509@astro.uio.no>

On 05/31/2012 10:13 PM, Robert Bradshaw wrote:
> On Thu, May 31, 2012 at 12:29 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 05/31/2012 08:50 PM, Robert Bradshaw wrote:
>>>
>>> On Thu, May 31, 2012 at 7:04 AM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>
>>>> [Discussion on numfocus at googlegroups.com please]
>>>>
>>>> I've uploaded a draft-state SEP 201 (previously CEP 1000):
>>>>
>>>> https://github.com/numfocus/sep/blob/master/sep201.rst
>>>>
>>>> """
>>>> Many callable objects are simply wrappers around native code. This holds
>>>> for
>>>> any Cython function, f2py functions, manually written CPython extensions,
>>>> Numba, etc.
>>>>
>>>> Obviously, when native code calls other native code, it would be nice to
>>>> skip the significant cost of boxing and unboxing all the arguments.
>>>> """
>>>>
>>>>
>>>> The thread about this on the Cython list is almost endless:
>>>>
>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>
>>>> There was a long discussion on the key-comparison vs. interned-string
>>>> approach. I've written both up in SEP 201 since it was the major point of
>>>> contention. There was some benchmarks starting here:
>>>>
>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>
>>>> And why provide a table and not a get_function_pointer starting here:
>>>>
>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>
>>>> For those who followed that and don't want to read the entire spec, the
>>>> aspect of flags is new. How do we avoid to duplicate entries/check
>>>> against
>>>> two signatures for cases like a GIL-holding caller wanting to call a
>>>> nogil
>>>> function? My take: For key-comparison you can compare under a mask, for
>>>> interned-string we should have additional flags field.
>>>>
>>>> The situation is a bit awkward: The Cython list consensus (well, me and
>>>> Robert Bradshaw) decided on what is "Approach 1" (key-comparison) in SEP
>>>> 201. I pushed for that.
>>>>
>>>> Still, now that a month has passed, I just think key-comparison is too
>>>> ugly,
>>>> and that the interning mechanism shouldn't be *that* hard to code up,
>>>> probably 500 lines of C code if one just requires the GIL in a first
>>>> iteration, and that keeping the spec simpler is more important.
>>>>
>>>> So I'm tentatively proposing Approach 2.
>>>
>>>
>>> I'm still not convinced that a hybrid approach, where signatures below
>>> some cutoff are compiled down to keys, is not a worthwhile approach.
>>> This gets around variable-length keys (both the complexity and
>>> possible runtime costs for long keys) and allows simple libraries to
>>> produce and consume fast callables without participating in the
>>> interning mechanism.
>>
>> I still think this gives us the "worst of both worlds", all the
>> disadvantages and none of the advantages.
>
> It avoids the one of the primary disadvantage of keys, namely the
> variable length complexity.
>
>> How many simple libraries are there really? Cython on one end, the
>> magnificently complicated NumPy ufuncs on the other? Thinking big, perhaps
>> PyPy and Julia? Cython, PyPy, Julia would all have to deal with long
>> signatures anyway. And NumPy ufuncs are already complicated so even more
>> low-level stuff wouldn't hurt.
>
> I was thinking of, for example, a differential equation solver written
> in C, C++, or Fortran that could take a PyNativeCallableTable*
> directly, primarily avoiding welding this spec to Python.

I'm not sure how real-world that is in the end. But, the size of Cython 
generated code would be kept down for most modules as it wouldn't need 
to bundle an interner.

AND, a problem with interning is spreading the signature strings all 
over memory (in the event you actually need to look at the contents). 
With a smart interner I guess this can be eliminated to some extent, but 
much better if one doesn't have to worry, and if all short signatures 
are keys you don't.

Playing along:

  a) It'd be very nice to avoid explicit decoding. I think one should be 
able to cast the key to char[]; this a) avoids having to allocate a 
buffer on the stack to pass to a Decode function, b) let's you inspect 
the table in a debugger easily.

  b) Flags are needed in addition to interning; GIL status and exception 
return values do not require exact matches. I think more than 3 bits are 
needed for flags => our minimal padded table entry size is actually 24 
bytes! (And this is OK, my benchmarks weren't affected by 8-byte vs 
16-byte comparisons, branching is so dominating.)

Now, 16 bits seems about right for flags, so this means we can actually 
for free use 14-char keys (12 for signature data, one for \0, one for guard)

That pushes the number of non-interning signatures high enough to make 
it really fit 95% of the use-cases. I feel 6 chars is a little low, 
remember that a "pointer to a double complex" is "&Zd" by itself unless 
we play with encoding.

BUT, it then gets rather complicated to have things work on 
little-endian vs. big-endian though as the guard byte must be in 
different positions. If you want to align the pointer you get this:

typedef struct {
     void *funcptr;
     union {
         union {
             struct {
                 uint16_t interned_flags;
                 uint16_t padding1;
                 uint32_t padding2;
                 uint64_t interned_sig;
             };
             struct {
                 uint16_t flags;
                 char sig[14];
             };
         } big_endian;

         union {
             struct {
                 uint64_t interned_sig;
                 uint32_t padding1;
                 uint16_t padding2;
                 uint16_t interned_flags;
             };
             struct {
                 char sig[14];
                 uint16_t flags;
             };
         } little_endian;
     };
};

(interned_flags and flags is really the same, I just didn't want to mess 
with the struct alignment)

So I think this is *almost* there, but it certainly gets complicated 
because of endianness issues.

Of course, an alternative is to not have the interned_sig be 64-bit 
aligned. Or, play with adapting the string/guard bytes in the middle, 
but that sort of breaks a) above.

Thinking about this is psychologically difficult because it's very 
likely bikeshedding, but OTOH once the spec is out in the wild it will 
never be worth it to change it so some care is called for... oh well, at 
least I'm having fun!

Dag

From d.s.seljebotn at astro.uio.no  Fri Jun  1 16:25:18 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 01 Jun 2012 16:25:18 +0200
Subject: [Cython] SEP 201 draft: Native callable objects
In-Reply-To: <4FC8C861.5040509@astro.uio.no>
References: <4FC77A5C.50009@astro.uio.no>
	<CADiQ+QDdOaTSzHd+4eGdLBNRVp--j6V4-+F53W7OidO-6SOOkw@mail.gmail.com>
	<4FC7C6A4.3060404@astro.uio.no>
	<CADiQ+QDH0GD51wSMSY+ZS-b3y=dyts1=-H31D6oHqNHarMrP1A@mail.gmail.com>
	<4FC8C861.5040509@astro.uio.no>
Message-ID: <4FC8D0CE.60903@astro.uio.no>

On 06/01/2012 03:49 PM, Dag Sverre Seljebotn wrote:
> On 05/31/2012 10:13 PM, Robert Bradshaw wrote:
>> On Thu, May 31, 2012 at 12:29 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 05/31/2012 08:50 PM, Robert Bradshaw wrote:
>>>>
>>>> On Thu, May 31, 2012 at 7:04 AM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>>
>>>>> [Discussion on numfocus at googlegroups.com please]
>>>>>
>>>>> I've uploaded a draft-state SEP 201 (previously CEP 1000):
>>>>>
>>>>> https://github.com/numfocus/sep/blob/master/sep201.rst
>>>>>
>>>>> """
>>>>> Many callable objects are simply wrappers around native code. This
>>>>> holds
>>>>> for
>>>>> any Cython function, f2py functions, manually written CPython
>>>>> extensions,
>>>>> Numba, etc.
>>>>>
>>>>> Obviously, when native code calls other native code, it would be
>>>>> nice to
>>>>> skip the significant cost of boxing and unboxing all the arguments.
>>>>> """
>>>>>
>>>>>
>>>>> The thread about this on the Cython list is almost endless:
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>>
>>>>>
>>>>> There was a long discussion on the key-comparison vs. interned-string
>>>>> approach. I've written both up in SEP 201 since it was the major
>>>>> point of
>>>>> contention. There was some benchmarks starting here:
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>>
>>>>>
>>>>> And why provide a table and not a get_function_pointer starting here:
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>>
>>>>>
>>>>> For those who followed that and don't want to read the entire spec,
>>>>> the
>>>>> aspect of flags is new. How do we avoid to duplicate entries/check
>>>>> against
>>>>> two signatures for cases like a GIL-holding caller wanting to call a
>>>>> nogil
>>>>> function? My take: For key-comparison you can compare under a mask,
>>>>> for
>>>>> interned-string we should have additional flags field.
>>>>>
>>>>> The situation is a bit awkward: The Cython list consensus (well, me
>>>>> and
>>>>> Robert Bradshaw) decided on what is "Approach 1" (key-comparison)
>>>>> in SEP
>>>>> 201. I pushed for that.
>>>>>
>>>>> Still, now that a month has passed, I just think key-comparison is too
>>>>> ugly,
>>>>> and that the interning mechanism shouldn't be *that* hard to code up,
>>>>> probably 500 lines of C code if one just requires the GIL in a first
>>>>> iteration, and that keeping the spec simpler is more important.
>>>>>
>>>>> So I'm tentatively proposing Approach 2.
>>>>
>>>>
>>>> I'm still not convinced that a hybrid approach, where signatures below
>>>> some cutoff are compiled down to keys, is not a worthwhile approach.
>>>> This gets around variable-length keys (both the complexity and
>>>> possible runtime costs for long keys) and allows simple libraries to
>>>> produce and consume fast callables without participating in the
>>>> interning mechanism.
>>>
>>> I still think this gives us the "worst of both worlds", all the
>>> disadvantages and none of the advantages.
>>
>> It avoids the one of the primary disadvantage of keys, namely the
>> variable length complexity.
>>
>>> How many simple libraries are there really? Cython on one end, the
>>> magnificently complicated NumPy ufuncs on the other? Thinking big,
>>> perhaps
>>> PyPy and Julia? Cython, PyPy, Julia would all have to deal with long
>>> signatures anyway. And NumPy ufuncs are already complicated so even more
>>> low-level stuff wouldn't hurt.
>>
>> I was thinking of, for example, a differential equation solver written
>> in C, C++, or Fortran that could take a PyNativeCallableTable*
>> directly, primarily avoiding welding this spec to Python.
>
> I'm not sure how real-world that is in the end. But, the size of Cython
> generated code would be kept down for most modules as it wouldn't need
> to bundle an interner.
>
> AND, a problem with interning is spreading the signature strings all
> over memory (in the event you actually need to look at the contents).
> With a smart interner I guess this can be eliminated to some extent, but
> much better if one doesn't have to worry, and if all short signatures
> are keys you don't.
>
> Playing along:
>
> a) It'd be very nice to avoid explicit decoding. I think one should be
> able to cast the key to char[]; this a) avoids having to allocate a
> buffer on the stack to pass to a Decode function, b) let's you inspect
> the table in a debugger easily.
>
> b) Flags are needed in addition to interning; GIL status and exception
> return values do not require exact matches. I think more than 3 bits are
> needed for flags => our minimal padded table entry size is actually 24
> bytes! (And this is OK, my benchmarks weren't affected by 8-byte vs
> 16-byte comparisons, branching is so dominating.)
>
> Now, 16 bits seems about right for flags, so this means we can actually
> for free use 14-char keys (12 for signature data, one for \0, one for
> guard)
>
> That pushes the number of non-interning signatures high enough to make
> it really fit 95% of the use-cases. I feel 6 chars is a little low,
> remember that a "pointer to a double complex" is "&Zd" by itself unless
> we play with encoding.
>
> BUT, it then gets rather complicated to have things work on
> little-endian vs. big-endian though as the guard byte must be in
> different positions. If you want to align the pointer you get this:
>
> typedef struct {
> void *funcptr;
> union {
> union {
> struct {
> uint16_t interned_flags;
> uint16_t padding1;
> uint32_t padding2;
> uint64_t interned_sig;
> };
> struct {
> uint16_t flags;
> char sig[14];
> };
> } big_endian;
>
> union {
> struct {
> uint64_t interned_sig;
> uint32_t padding1;
> uint16_t padding2;
> uint16_t interned_flags;
> };
> struct {
> char sig[14];
> uint16_t flags;
> };
> } little_endian;
> };
> };
>
> (interned_flags and flags is really the same, I just didn't want to mess
> with the struct alignment)
>
> So I think this is *almost* there, but it certainly gets complicated
> because of endianness issues.
>
> Of course, an alternative is to not have the interned_sig be 64-bit
> aligned. Or, play with adapting the string/guard bytes in the middle,
> but that sort of breaks a) above.

OK, now I feel silly.

If we need flags (as I believe we do), and the flag-containing quadword 
is being compared anyway, there's no reason at all to play tricks with 
aligned pointers and guard bytes.

Simplest approach with a 128-bit compare (which, as I said, doesn't hurt 
one bit and may be needed anyway to filter on GIL-ness), is then

struct {
     union {
         char *interned_sig;
         char signature[8]
     };
     uint64_t flags; // first 8 bits always 0, for terminating \0
     void *funcptr;
};

One could also complicate this again to eat a few more flag bits for 
signature chars...

Dag

From d.s.seljebotn at astro.uio.no  Mon Jun  4 21:44:11 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Mon, 04 Jun 2012 21:44:11 +0200
Subject: [Cython] Hash-based vtables
Message-ID: <4FCD100B.7000008@astro.uio.no>

Me and Robert had a long discussion on the NumFOCUS list about this
already, but I figured it was better to continue it and provide more
in-depth benchmark results here.

It's basically a new idea of how to provide a vtable based on perfect
hashing, which should be a lot simpler to implement than what I first
imagined.

I'll write down some context first, if you're familiar with this
skip ahead a bit..

This means that you can do fast dispatches *without* the messy
business of binding vtable slots at compile time. To be concrete, this
might e.g. take the form

def f(obj):
     obj.method(3.4) # try to find a vtable with "void method(double)" in it

or, a more typed approach,

# File A
cdef class MyImpl:
     def double method(double x): return x * x

# File B
# Here we never know about MyImpl, hence "duck-typed"
@cython.interface
class MyIntf:
     def double method(double x): pass

def f(MyIntf obj):
     # obj *can* be MyImpl instance, or whatever else that supports
     # that interface
     obj.method(3.4)


Now, the idea to implement this is:

  a) Both caller and callee pre-hash name/argument string
     "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
     md5)

  b) Callee (MyImpl) generates a vtable of its methods by *perfect*
     hashing. What you do is define a final hash fh as a function
     of the pre-hash ph, for instance

fh = ((ph >> vtable.r1) ^ (ph >> vtable.r2) ^ (ph >> vtable.r3)) & vtable.m

(Me and Robert are benchmarking different functions to use here.)  By
playing with r1, r2, r3, you have 64**3 choices of hash function, and
will be able to pick a combination which gives *no* (or very few)
collisions.

  c) Caller then combines the pre-hash generated at compile-time, with
     r1, r2, r3, m stored in the vtable header, in order to find the
     final location in the hash-table.

The exciting thing is that in benchmark, the performance penalty is
actually very slight over a C++-style v-table. (Of course you can
cache a proper vtable, but the fact that you get so close without
caring about caching means that this can be done much faster.)

Back to my and Robert's discussion on benchmarks:

I've uploaded benchmarks here:

https://github.com/dagss/hashvtable/tree/master/dispatchbench

I've changed the benchmark taking to give more robust numbers (at
least for me), you want to look at the 'min' column.

I changed the benchmark a bit so that it benchmarks a *callsite*.
So we don't pass 'h' on the stack, but either a) looks it up in a global
variable (default), or b) it's a compile-time constant (immediate in
assembly) (compile with -DIMHASH).

Similarly, the ID is either an "interned" global variable, or an
immediate (-DIMID).

The results are very different on my machine depending on this aspect.
My conclusions:

  - Both three shifts with masking, two shifts with a "fallback slot"
    (allowing for a single collision), three shifts, two shifts with
    two masks allows for constructing good vtables. In the case of only
    two shifts, one colliding method gets the twoshift+fback
    performance and the rest gets the twoshift performance.

  - Performance is really more affected by whether hashes are
    immediates or global variables than the hash function. This is in
    contrast to the interning vs. key benchmarks -- so I think that if
    we looked up the vtable through PyTypeObject, rather than getting
    the vtable directly, the loads of the global variables could
    potentially be masked by that.

  - My conclusion: Just use lower bits of md5 *both* for the hashing
    and the ID-ing (don't bother with any interning), and compile the
    thing as a 64-bit immediate. This can cause crashes/stack smashes
    etc. if there's lower-64bit-of-md5 collisions, but a) the
    probability is incredibly small, b) it would only matter in
    situations that should cause an AttributeError anyway, c) if we
    really care, we can always use an interning-like mechanism to
    validate on module loading that its hashes doesn't collide with
    other hashes (and raise an exception "Congratulations, you've
    discovered a phenomenal md5 collision, get in touch with cython
    devs and we'll work around it right away").

    The RTTI (i.e. the char*) is also put in there, but is not used for
    comparison and is not interned.

At least, that's what I think we should do for duck-style vtables.

Do we then go to all the pain of defining key-encoding, interning
etc. just for SEP 201? Isn't it easier to just mandate a md5 dependency
and be done with it? (After all, md5 usually comes with Python in the
md5 and hashlib modules)

direct: Early-binding
index: Call slot 0 (C++-style vtable/function pointer)
noshift: h & m1
oneshift: (h >> r1) & m1
twoshift: ((h >> r1) ^ (h >> r2)) & m1
twoshift+fback: hash doesn't
threeshift: ((h >> r1) ^ (h >> r2) ^ (h >> r3)) & m1
doublemask: ((h >> r1) & m1) ^ ((h >> r2) & m2)
doublemask2: ((h >> r1) & m1) ^ ((h & m2) >> r2)

Default distutils build (-O2):
------------------------------

Hash globalvar, ids globalvar

          direct: min=5.38e-09  mean=5.45e-09  std=3.79e-11 
val=1600000000.000000
           index: min=5.38e-09  mean=5.44e-09  std=3.09e-11 
val=1200000000.000000
         noshift: min=5.99e-09  mean=6.14e-09  std=6.63e-11 
val=1200000000.000000
        oneshift: min=6.47e-09  mean=6.53e-09  std=3.21e-11 
val=1200000000.000000
        twoshift: min=7.00e-09  mean=7.08e-09  std=3.73e-11 
val=1200000000.000000
  twoshift+fback: min=7.54e-09  mean=7.61e-09  std=4.46e-11 
val=1200000000.000000
      threeshift: min=7.54e-09  mean=7.64e-09  std=3.79e-11 
val=1200000000.000000
      doublemask: min=7.56e-09  mean=7.64e-09  std=3.02e-11 
val=1200000000.000000
     doublemask2: min=7.55e-09  mean=7.62e-09  std=3.24e-11 
val=1200000000.000000

hash immediate, ids globalvar

          direct: min=5.38e-09  mean=5.45e-09  std=3.87e-11 
val=1600000000.000000
           index: min=5.40e-09  mean=5.45e-09  std=2.92e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.44e-09  std=3.48e-11 
val=1200000000.000000
        oneshift: min=5.90e-09  mean=5.99e-09  std=4.05e-11 
val=1200000000.000000
        twoshift: min=6.09e-09  mean=6.17e-09  std=3.52e-11 
val=1200000000.000000
  twoshift+fback: min=7.00e-09  mean=7.08e-09  std=3.64e-11 
val=1200000000.000000
      threeshift: min=6.47e-09  mean=6.55e-09  std=6.04e-11 
val=1200000000.000000
      doublemask: min=6.46e-09  mean=6.50e-09  std=3.37e-11 
val=1200000000.000000
     doublemask2: min=6.46e-09  mean=6.51e-09  std=3.04e-11 
val=1200000000.000000

all immediate:

          direct: min=5.39e-09  mean=5.50e-09  std=5.22e-11 
val=1600000000.000000
           index: min=5.38e-09  mean=5.51e-09  std=6.25e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.51e-09  std=6.90e-11 
val=1200000000.000000
        oneshift: min=5.40e-09  mean=5.51e-09  std=5.35e-11 
val=1200000000.000000
        twoshift: min=5.94e-09  mean=6.06e-09  std=5.91e-11 
val=1200000000.000000
  twoshift+fback: min=7.06e-09  mean=7.19e-09  std=5.39e-11 
val=1200000000.000000
      threeshift: min=5.96e-09  mean=6.07e-09  std=5.54e-11 
val=1200000000.000000
      doublemask: min=5.88e-09  mean=6.01e-09  std=6.06e-11 
val=1200000000.000000
     doublemask2: min=5.94e-09  mean=6.05e-09  std=6.16e-11 
val=1200000000.000000

-O3 build
---------

all globalvars:

          direct: min=1.61e-09  mean=1.63e-09  std=1.40e-11 
val=1600000000.000000
           index: min=5.38e-09  mean=5.43e-09  std=2.82e-11 
val=1200000000.000000
         noshift: min=6.04e-09  mean=6.13e-09  std=4.76e-11 
val=1200000000.000000
        oneshift: min=6.46e-09  mean=6.54e-09  std=3.82e-11 
val=1200000000.000000
        twoshift: min=7.01e-09  mean=7.06e-09  std=3.41e-11 
val=1200000000.000000
  twoshift+fback: min=7.57e-09  mean=7.64e-09  std=3.47e-11 
val=1200000000.000000
      threeshift: min=7.54e-09  mean=7.63e-09  std=4.17e-11 
val=1200000000.000000
      doublemask: min=7.54e-09  mean=7.61e-09  std=3.64e-11 
val=1200000000.000000
     doublemask2: min=7.55e-09  mean=7.63e-09  std=3.35e-11 
val=1200000000.000000

hash immediate, ids globalvar:

          direct: min=1.61e-09  mean=1.66e-09  std=3.30e-11 
val=1600000000.000000
           index: min=5.40e-09  mean=5.50e-09  std=4.94e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.49e-09  std=6.02e-11 
val=1200000000.000000
        oneshift: min=5.95e-09  mean=6.06e-09  std=6.64e-11 
val=1200000000.000000
        twoshift: min=5.96e-09  mean=6.13e-09  std=7.22e-11 
val=1200000000.000000
  twoshift+fback: min=7.02e-09  mean=7.18e-09  std=7.04e-11 
val=1200000000.000000
      threeshift: min=6.52e-09  mean=6.65e-09  std=6.43e-11 
val=1200000000.000000
      doublemask: min=6.50e-09  mean=6.62e-09  std=5.28e-11 
val=1200000000.000000
     doublemask2: min=6.52e-09  mean=6.63e-09  std=5.23e-11 
val=1200000000.000000

all immediate:

          direct: min=1.61e-09  mean=1.62e-09  std=9.77e-12 
val=1600000000.000000
           index: min=5.38e-09  mean=5.39e-09  std=1.71e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.40e-09  std=2.41e-11 
val=1200000000.000000
        oneshift: min=5.38e-09  mean=5.40e-09  std=1.81e-11 
val=1200000000.000000
        twoshift: min=5.92e-09  mean=5.93e-09  std=1.43e-11 
val=1200000000.000000
  twoshift+fback: min=7.00e-09  mean=7.01e-09  std=2.20e-11 
val=1200000000.000000
      threeshift: min=5.92e-09  mean=5.94e-09  std=1.99e-11 
val=1200000000.000000
      doublemask: min=5.79e-09  mean=5.82e-09  std=2.32e-11 
val=1200000000.000000
     doublemask2: min=5.92e-09  mean=5.94e-09  std=2.25e-11 
val=1200000000.000000


From d.s.seljebotn at astro.uio.no  Mon Jun  4 22:55:56 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Mon, 04 Jun 2012 22:55:56 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCD100B.7000008@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no>
Message-ID: <4FCD20DC.6090906@astro.uio.no>

On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
> Me and Robert had a long discussion on the NumFOCUS list about this
> already, but I figured it was better to continue it and provide more
> in-depth benchmark results here.
>
> It's basically a new idea of how to provide a vtable based on perfect
> hashing, which should be a lot simpler to implement than what I first
> imagined.
>
> I'll write down some context first, if you're familiar with this
> skip ahead a bit..
>
> This means that you can do fast dispatches *without* the messy
> business of binding vtable slots at compile time. To be concrete, this
> might e.g. take the form
>
> def f(obj):
> obj.method(3.4) # try to find a vtable with "void method(double)" in it
>
> or, a more typed approach,
>
> # File A
> cdef class MyImpl:
> def double method(double x): return x * x
>
> # File B
> # Here we never know about MyImpl, hence "duck-typed"
> @cython.interface
> class MyIntf:
> def double method(double x): pass
>
> def f(MyIntf obj):
> # obj *can* be MyImpl instance, or whatever else that supports
> # that interface
> obj.method(3.4)
>
>
> Now, the idea to implement this is:
>
> a) Both caller and callee pre-hash name/argument string
> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
> md5)
>
> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
> hashing. What you do is define a final hash fh as a function
> of the pre-hash ph, for instance
>
> fh = ((ph >> vtable.r1) ^ (ph >> vtable.r2) ^ (ph >> vtable.r3)) & vtable.m
>
> (Me and Robert are benchmarking different functions to use here.) By
> playing with r1, r2, r3, you have 64**3 choices of hash function, and
> will be able to pick a combination which gives *no* (or very few)
> collisions.
>
> c) Caller then combines the pre-hash generated at compile-time, with
> r1, r2, r3, m stored in the vtable header, in order to find the
> final location in the hash-table.
>
> The exciting thing is that in benchmark, the performance penalty is
> actually very slight over a C++-style v-table. (Of course you can
> cache a proper vtable, but the fact that you get so close without
> caring about caching means that this can be done much faster.)
>
> Back to my and Robert's discussion on benchmarks:
>
> I've uploaded benchmarks here:
>
> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>
> I've changed the benchmark taking to give more robust numbers (at
> least for me), you want to look at the 'min' column.
>
> I changed the benchmark a bit so that it benchmarks a *callsite*.
> So we don't pass 'h' on the stack, but either a) looks it up in a global
> variable (default), or b) it's a compile-time constant (immediate in
> assembly) (compile with -DIMHASH).
>
> Similarly, the ID is either an "interned" global variable, or an
> immediate (-DIMID).
>
> The results are very different on my machine depending on this aspect.
> My conclusions:
>
> - Both three shifts with masking, two shifts with a "fallback slot"
> (allowing for a single collision), three shifts, two shifts with
> two masks allows for constructing good vtables. In the case of only
> two shifts, one colliding method gets the twoshift+fback
> performance and the rest gets the twoshift performance.
>
> - Performance is really more affected by whether hashes are
> immediates or global variables than the hash function. This is in
> contrast to the interning vs. key benchmarks -- so I think that if
> we looked up the vtable through PyTypeObject, rather than getting
> the vtable directly, the loads of the global variables could
> potentially be masked by that.
>
> - My conclusion: Just use lower bits of md5 *both* for the hashing
> and the ID-ing (don't bother with any interning), and compile the
> thing as a 64-bit immediate. This can cause crashes/stack smashes
> etc. if there's lower-64bit-of-md5 collisions, but a) the
> probability is incredibly small, b) it would only matter in
> situations that should cause an AttributeError anyway, c) if we
> really care, we can always use an interning-like mechanism to
> validate on module loading that its hashes doesn't collide with
> other hashes (and raise an exception "Congratulations, you've
> discovered a phenomenal md5 collision, get in touch with cython
> devs and we'll work around it right away").

What I forgot to mention:

  - I really want to avoid linear probing just because of the code bloat 
in call sites. With two shifts, when there was a failure to find a 
perfect hash it was always possible to find one with a single collision.

  - Probing for the hash with two shifts is lightning fast, it can take 
a while with three shifts (though you can always spend more memory on a 
bigger table to make it fast again). However, it makes me uneasy to 
penalize the performance of calling one of the random methods, so I'm 
really in favour of three-shifts or double-mask (to be decided when 
investigating the performance of probing for parameters in more detail).

  - I tried using SSE to do shifts in parallel and failed (miserable 
performance). The problem is quickly moving things between general 
purpose registers and SSE registers, and the lack of SSE 
immediates/constants in the instruction stream. At least, what my gcc 
4.6 generates appeared to use the stack to communicate between SSE 
registers and general purpose registers (but I can't have been doing the 
right thing..).


>
> The RTTI (i.e. the char*) is also put in there, but is not used for
> comparison and is not interned.
>
> At least, that's what I think we should do for duck-style vtables.
>
> Do we then go to all the pain of defining key-encoding, interning
> etc. just for SEP 201? Isn't it easier to just mandate a md5 dependency
> and be done with it? (After all, md5 usually comes with Python in the
> md5 and hashlib modules)
>
> direct: Early-binding
> index: Call slot 0 (C++-style vtable/function pointer)
> noshift: h & m1
> oneshift: (h >> r1) & m1
> twoshift: ((h >> r1) ^ (h >> r2)) & m1
> twoshift+fback: hash doesn't

I meant: Hash collision and then, after a branch miss, look up the one 
fallback slot in the vtable header.

Dag

> threeshift: ((h >> r1) ^ (h >> r2) ^ (h >> r3)) & m1
> doublemask: ((h >> r1) & m1) ^ ((h >> r2) & m2)
> doublemask2: ((h >> r1) & m1) ^ ((h & m2) >> r2)
>
> Default distutils build (-O2):
> ------------------------------
>
> Hash globalvar, ids globalvar
>
> direct: min=5.38e-09 mean=5.45e-09 std=3.79e-11 val=1600000000.000000
> index: min=5.38e-09 mean=5.44e-09 std=3.09e-11 val=1200000000.000000
> noshift: min=5.99e-09 mean=6.14e-09 std=6.63e-11 val=1200000000.000000
> oneshift: min=6.47e-09 mean=6.53e-09 std=3.21e-11 val=1200000000.000000
> twoshift: min=7.00e-09 mean=7.08e-09 std=3.73e-11 val=1200000000.000000
> twoshift+fback: min=7.54e-09 mean=7.61e-09 std=4.46e-11
> val=1200000000.000000
> threeshift: min=7.54e-09 mean=7.64e-09 std=3.79e-11 val=1200000000.000000
> doublemask: min=7.56e-09 mean=7.64e-09 std=3.02e-11 val=1200000000.000000
> doublemask2: min=7.55e-09 mean=7.62e-09 std=3.24e-11 val=1200000000.000000
>
> hash immediate, ids globalvar
>
> direct: min=5.38e-09 mean=5.45e-09 std=3.87e-11 val=1600000000.000000
> index: min=5.40e-09 mean=5.45e-09 std=2.92e-11 val=1200000000.000000
> noshift: min=5.38e-09 mean=5.44e-09 std=3.48e-11 val=1200000000.000000
> oneshift: min=5.90e-09 mean=5.99e-09 std=4.05e-11 val=1200000000.000000
> twoshift: min=6.09e-09 mean=6.17e-09 std=3.52e-11 val=1200000000.000000
> twoshift+fback: min=7.00e-09 mean=7.08e-09 std=3.64e-11
> val=1200000000.000000
> threeshift: min=6.47e-09 mean=6.55e-09 std=6.04e-11 val=1200000000.000000
> doublemask: min=6.46e-09 mean=6.50e-09 std=3.37e-11 val=1200000000.000000
> doublemask2: min=6.46e-09 mean=6.51e-09 std=3.04e-11 val=1200000000.000000
>
> all immediate:
>
> direct: min=5.39e-09 mean=5.50e-09 std=5.22e-11 val=1600000000.000000
> index: min=5.38e-09 mean=5.51e-09 std=6.25e-11 val=1200000000.000000
> noshift: min=5.38e-09 mean=5.51e-09 std=6.90e-11 val=1200000000.000000
> oneshift: min=5.40e-09 mean=5.51e-09 std=5.35e-11 val=1200000000.000000
> twoshift: min=5.94e-09 mean=6.06e-09 std=5.91e-11 val=1200000000.000000
> twoshift+fback: min=7.06e-09 mean=7.19e-09 std=5.39e-11
> val=1200000000.000000
> threeshift: min=5.96e-09 mean=6.07e-09 std=5.54e-11 val=1200000000.000000
> doublemask: min=5.88e-09 mean=6.01e-09 std=6.06e-11 val=1200000000.000000
> doublemask2: min=5.94e-09 mean=6.05e-09 std=6.16e-11 val=1200000000.000000
>
> -O3 build
> ---------
>
> all globalvars:
>
> direct: min=1.61e-09 mean=1.63e-09 std=1.40e-11 val=1600000000.000000
> index: min=5.38e-09 mean=5.43e-09 std=2.82e-11 val=1200000000.000000
> noshift: min=6.04e-09 mean=6.13e-09 std=4.76e-11 val=1200000000.000000
> oneshift: min=6.46e-09 mean=6.54e-09 std=3.82e-11 val=1200000000.000000
> twoshift: min=7.01e-09 mean=7.06e-09 std=3.41e-11 val=1200000000.000000
> twoshift+fback: min=7.57e-09 mean=7.64e-09 std=3.47e-11
> val=1200000000.000000
> threeshift: min=7.54e-09 mean=7.63e-09 std=4.17e-11 val=1200000000.000000
> doublemask: min=7.54e-09 mean=7.61e-09 std=3.64e-11 val=1200000000.000000
> doublemask2: min=7.55e-09 mean=7.63e-09 std=3.35e-11 val=1200000000.000000
>
> hash immediate, ids globalvar:
>
> direct: min=1.61e-09 mean=1.66e-09 std=3.30e-11 val=1600000000.000000
> index: min=5.40e-09 mean=5.50e-09 std=4.94e-11 val=1200000000.000000
> noshift: min=5.38e-09 mean=5.49e-09 std=6.02e-11 val=1200000000.000000
> oneshift: min=5.95e-09 mean=6.06e-09 std=6.64e-11 val=1200000000.000000
> twoshift: min=5.96e-09 mean=6.13e-09 std=7.22e-11 val=1200000000.000000
> twoshift+fback: min=7.02e-09 mean=7.18e-09 std=7.04e-11
> val=1200000000.000000
> threeshift: min=6.52e-09 mean=6.65e-09 std=6.43e-11 val=1200000000.000000
> doublemask: min=6.50e-09 mean=6.62e-09 std=5.28e-11 val=1200000000.000000
> doublemask2: min=6.52e-09 mean=6.63e-09 std=5.23e-11 val=1200000000.000000
>
> all immediate:
>
> direct: min=1.61e-09 mean=1.62e-09 std=9.77e-12 val=1600000000.000000
> index: min=5.38e-09 mean=5.39e-09 std=1.71e-11 val=1200000000.000000
> noshift: min=5.38e-09 mean=5.40e-09 std=2.41e-11 val=1200000000.000000
> oneshift: min=5.38e-09 mean=5.40e-09 std=1.81e-11 val=1200000000.000000
> twoshift: min=5.92e-09 mean=5.93e-09 std=1.43e-11 val=1200000000.000000
> twoshift+fback: min=7.00e-09 mean=7.01e-09 std=2.20e-11
> val=1200000000.000000
> threeshift: min=5.92e-09 mean=5.94e-09 std=1.99e-11 val=1200000000.000000
> doublemask: min=5.79e-09 mean=5.82e-09 std=2.32e-11 val=1200000000.000000
> doublemask2: min=5.92e-09 mean=5.94e-09 std=2.25e-11 val=1200000000.000000
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


From robertwb at gmail.com  Mon Jun  4 23:43:07 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Mon, 4 Jun 2012 14:43:07 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCD20DC.6090906@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
Message-ID: <CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>

On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>
>> Me and Robert had a long discussion on the NumFOCUS list about this
>> already, but I figured it was better to continue it and provide more
>> in-depth benchmark results here.
>>
>> It's basically a new idea of how to provide a vtable based on perfect
>> hashing, which should be a lot simpler to implement than what I first
>> imagined.
>>
>> I'll write down some context first, if you're familiar with this
>> skip ahead a bit..
>>
>> This means that you can do fast dispatches *without* the messy
>> business of binding vtable slots at compile time. To be concrete, this
>> might e.g. take the form
>>
>> def f(obj):
>> obj.method(3.4) # try to find a vtable with "void method(double)" in it
>>
>> or, a more typed approach,
>>
>> # File A
>> cdef class MyImpl:
>> def double method(double x): return x * x
>>
>> # File B
>> # Here we never know about MyImpl, hence "duck-typed"
>> @cython.interface
>> class MyIntf:
>> def double method(double x): pass
>>
>> def f(MyIntf obj):
>> # obj *can* be MyImpl instance, or whatever else that supports
>> # that interface
>> obj.method(3.4)
>>
>>
>> Now, the idea to implement this is:
>>
>> a) Both caller and callee pre-hash name/argument string
>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>> md5)
>>
>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>> hashing. What you do is define a final hash fh as a function
>> of the pre-hash ph, for instance
>>
>> fh = ((ph >> vtable.r1) ^ (ph >> vtable.r2) ^ (ph >> vtable.r3)) &
>> vtable.m
>>
>> (Me and Robert are benchmarking different functions to use here.) By
>> playing with r1, r2, r3, you have 64**3 choices of hash function, and
>> will be able to pick a combination which gives *no* (or very few)
>> collisions.
>>
>> c) Caller then combines the pre-hash generated at compile-time, with
>> r1, r2, r3, m stored in the vtable header, in order to find the
>> final location in the hash-table.
>>
>> The exciting thing is that in benchmark, the performance penalty is
>> actually very slight over a C++-style v-table. (Of course you can
>> cache a proper vtable, but the fact that you get so close without
>> caring about caching means that this can be done much faster.)

One advantage about caching a vtable is that one can possibly put in
adapters for non-exact matches. It also opens up the possibility of
putting in stubs to call def methods if they exist. This needs to be
fleshed out more, (another CEP :) but could provide for a
backwards-compatible easy first implementation.

>> Back to my and Robert's discussion on benchmarks:
>>
>> I've uploaded benchmarks here:
>>
>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>
>> I've changed the benchmark taking to give more robust numbers (at
>> least for me), you want to look at the 'min' column.
>>
>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>> So we don't pass 'h' on the stack, but either a) looks it up in a global
>> variable (default), or b) it's a compile-time constant (immediate in
>> assembly) (compile with -DIMHASH).
>>
>> Similarly, the ID is either an "interned" global variable, or an
>> immediate (-DIMID).
>>
>> The results are very different on my machine depending on this aspect.
>> My conclusions:
>>
>> - Both three shifts with masking, two shifts with a "fallback slot"
>> (allowing for a single collision), three shifts, two shifts with
>> two masks allows for constructing good vtables. In the case of only
>> two shifts, one colliding method gets the twoshift+fback
>> performance and the rest gets the twoshift performance.
>>
>> - Performance is really more affected by whether hashes are
>> immediates or global variables than the hash function. This is in
>> contrast to the interning vs. key benchmarks -- so I think that if
>> we looked up the vtable through PyTypeObject, rather than getting
>> the vtable directly, the loads of the global variables could
>> potentially be masked by that.
>>
>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>> and the ID-ing (don't bother with any interning), and compile the
>> thing as a 64-bit immediate. This can cause crashes/stack smashes
>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>> probability is incredibly small, b) it would only matter in
>> situations that should cause an AttributeError anyway, c) if we
>> really care, we can always use an interning-like mechanism to
>> validate on module loading that its hashes doesn't collide with
>> other hashes (and raise an exception "Congratulations, you've
>> discovered a phenomenal md5 collision, get in touch with cython
>> devs and we'll work around it right away").

Due to the birthday paradox, this seems a bit risky. Maybe it's
because I regularly work with collections much bigger than 2^32, and I
suppose we're talking about unique method names and signatures here,
but still... I wonder what the penalty would be for checking the full
128 bit hash. (Storing it could allow for greater entropy in the
optimal hash table search as well).

> What I forgot to mention:
>
> ?- I really want to avoid linear probing just because of the code bloat in
> call sites.

That's a good point. What about flags--are we throwing out the idea of masking?

> With two shifts, when there was a failure to find a perfect hash
> it was always possible to find one with a single collision.
>
> ?- Probing for the hash with two shifts is lightning fast, it can take a
> while with three shifts (though you can always spend more memory on a bigger
> table to make it fast again). However, it makes me uneasy to penalize the
> performance of calling one of the random methods, so I'm really in favour of
> three-shifts or double-mask (to be decided when investigating the
> performance of probing for parameters in more detail).
>
> ?- I tried using SSE to do shifts in parallel and failed (miserable
> performance). The problem is quickly moving things between general purpose
> registers and SSE registers, and the lack of SSE immediates/constants in the
> instruction stream. At least, what my gcc 4.6 generates appeared to use the
> stack to communicate between SSE registers and general purpose registers
> (but I can't have been doing the right thing..).
>
>
>
>>
>> The RTTI (i.e. the char*) is also put in there, but is not used for
>> comparison and is not interned.
>>
>> At least, that's what I think we should do for duck-style vtables.
>>
>> Do we then go to all the pain of defining key-encoding, interning
>> etc. just for SEP 201? Isn't it easier to just mandate a md5 dependency
>> and be done with it? (After all, md5 usually comes with Python in the
>> md5 and hashlib modules)
>>
>> direct: Early-binding
>> index: Call slot 0 (C++-style vtable/function pointer)
>> noshift: h & m1
>> oneshift: (h >> r1) & m1
>> twoshift: ((h >> r1) ^ (h >> r2)) & m1
>> twoshift+fback: hash doesn't
>
>
> I meant: Hash collision and then, after a branch miss, look up the one
> fallback slot in the vtable header.

We could also do a fallback table. Usually it'd be empty, Occasionally
it'd have one element in it. It'd always be possible to make this big
enough to avoid collisions in a worst-case scenario.

BTW, this is a general static char* -> void* dictionary, I bet it
could possibly have other uses. (It may also be a well-studied
problem, though a bit hard to search for...) I suppose we could reduce
it to read-optimized int -> int mappings.

- Robert

From d.s.seljebotn at astro.uio.no  Tue Jun  5 00:07:30 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 05 Jun 2012 00:07:30 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
Message-ID: <048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>


Robert Bradshaw <robertwb at gmail.com> wrote:

>On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
><d.s.seljebotn at astro.uio.no> wrote:
>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>
>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>> already, but I figured it was better to continue it and provide more
>>> in-depth benchmark results here.
>>>
>>> It's basically a new idea of how to provide a vtable based on
>perfect
>>> hashing, which should be a lot simpler to implement than what I
>first
>>> imagined.
>>>
>>> I'll write down some context first, if you're familiar with this
>>> skip ahead a bit..
>>>
>>> This means that you can do fast dispatches *without* the messy
>>> business of binding vtable slots at compile time. To be concrete,
>this
>>> might e.g. take the form
>>>
>>> def f(obj):
>>> obj.method(3.4) # try to find a vtable with "void method(double)" in
>it
>>>
>>> or, a more typed approach,
>>>
>>> # File A
>>> cdef class MyImpl:
>>> def double method(double x): return x * x
>>>
>>> # File B
>>> # Here we never know about MyImpl, hence "duck-typed"
>>> @cython.interface
>>> class MyIntf:
>>> def double method(double x): pass
>>>
>>> def f(MyIntf obj):
>>> # obj *can* be MyImpl instance, or whatever else that supports
>>> # that interface
>>> obj.method(3.4)
>>>
>>>
>>> Now, the idea to implement this is:
>>>
>>> a) Both caller and callee pre-hash name/argument string
>>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>>> md5)
>>>
>>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>>> hashing. What you do is define a final hash fh as a function
>>> of the pre-hash ph, for instance
>>>
>>> fh = ((ph >> vtable.r1) ^ (ph >> vtable.r2) ^ (ph >> vtable.r3)) &
>>> vtable.m
>>>
>>> (Me and Robert are benchmarking different functions to use here.) By
>>> playing with r1, r2, r3, you have 64**3 choices of hash function,
>and
>>> will be able to pick a combination which gives *no* (or very few)
>>> collisions.
>>>
>>> c) Caller then combines the pre-hash generated at compile-time, with
>>> r1, r2, r3, m stored in the vtable header, in order to find the
>>> final location in the hash-table.
>>>
>>> The exciting thing is that in benchmark, the performance penalty is
>>> actually very slight over a C++-style v-table. (Of course you can
>>> cache a proper vtable, but the fact that you get so close without
>>> caring about caching means that this can be done much faster.)
>
>One advantage about caching a vtable is that one can possibly put in
>adapters for non-exact matches. It also opens up the possibility of
>putting in stubs to call def methods if they exist. This needs to be
>fleshed out more, (another CEP :) but could provide for a
>backwards-compatible easy first implementation.
>
>>> Back to my and Robert's discussion on benchmarks:
>>>
>>> I've uploaded benchmarks here:
>>>
>>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>>
>>> I've changed the benchmark taking to give more robust numbers (at
>>> least for me), you want to look at the 'min' column.
>>>
>>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>>> So we don't pass 'h' on the stack, but either a) looks it up in a
>global
>>> variable (default), or b) it's a compile-time constant (immediate in
>>> assembly) (compile with -DIMHASH).
>>>
>>> Similarly, the ID is either an "interned" global variable, or an
>>> immediate (-DIMID).
>>>
>>> The results are very different on my machine depending on this
>aspect.
>>> My conclusions:
>>>
>>> - Both three shifts with masking, two shifts with a "fallback slot"
>>> (allowing for a single collision), three shifts, two shifts with
>>> two masks allows for constructing good vtables. In the case of only
>>> two shifts, one colliding method gets the twoshift+fback
>>> performance and the rest gets the twoshift performance.
>>>
>>> - Performance is really more affected by whether hashes are
>>> immediates or global variables than the hash function. This is in
>>> contrast to the interning vs. key benchmarks -- so I think that if
>>> we looked up the vtable through PyTypeObject, rather than getting
>>> the vtable directly, the loads of the global variables could
>>> potentially be masked by that.
>>>
>>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>>> and the ID-ing (don't bother with any interning), and compile the
>>> thing as a 64-bit immediate. This can cause crashes/stack smashes
>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>> probability is incredibly small, b) it would only matter in
>>> situations that should cause an AttributeError anyway, c) if we
>>> really care, we can always use an interning-like mechanism to
>>> validate on module loading that its hashes doesn't collide with
>>> other hashes (and raise an exception "Congratulations, you've
>>> discovered a phenomenal md5 collision, get in touch with cython
>>> devs and we'll work around it right away").
>
>Due to the birthday paradox, this seems a bit risky. Maybe it's
>because I regularly work with collections much bigger than 2^32, and I
>suppose we're talking about unique method names and signatures here,
>but still... I wonder what the penalty would be for checking the full
>128 bit hash. (Storing it could allow for greater entropy in the
>optimal hash table search as well).
>
>> What I forgot to mention:
>>
>> ?- I really want to avoid linear probing just because of the code
>bloat in
>> call sites.
>
>That's a good point. What about flags--are we throwing out the idea of
>masking?
>
>> With two shifts, when there was a failure to find a perfect hash
>> it was always possible to find one with a single collision.
>>
>> ?- Probing for the hash with two shifts is lightning fast, it can
>take a
>> while with three shifts (though you can always spend more memory on a
>bigger
>> table to make it fast again). However, it makes me uneasy to penalize
>the
>> performance of calling one of the random methods, so I'm really in
>favour of
>> three-shifts or double-mask (to be decided when investigating the
>> performance of probing for parameters in more detail).
>>
>> ?- I tried using SSE to do shifts in parallel and failed (miserable
>> performance). The problem is quickly moving things between general
>purpose
>> registers and SSE registers, and the lack of SSE immediates/constants
>in the
>> instruction stream. At least, what my gcc 4.6 generates appeared to
>use the
>> stack to communicate between SSE registers and general purpose
>registers
>> (but I can't have been doing the right thing..).
>>
>>
>>
>>>
>>> The RTTI (i.e. the char*) is also put in there, but is not used for
>>> comparison and is not interned.
>>>
>>> At least, that's what I think we should do for duck-style vtables.
>>>
>>> Do we then go to all the pain of defining key-encoding, interning
>>> etc. just for SEP 201? Isn't it easier to just mandate a md5
>dependency
>>> and be done with it? (After all, md5 usually comes with Python in
>the
>>> md5 and hashlib modules)
>>>
>>> direct: Early-binding
>>> index: Call slot 0 (C++-style vtable/function pointer)
>>> noshift: h & m1
>>> oneshift: (h >> r1) & m1
>>> twoshift: ((h >> r1) ^ (h >> r2)) & m1
>>> twoshift+fback: hash doesn't
>>
>>
>> I meant: Hash collision and then, after a branch miss, look up the
>one
>> fallback slot in the vtable header.
>
>We could also do a fallback table. Usually it'd be empty, Occasionally
>it'd have one element in it. It'd always be possible to make this big
>enough to avoid collisions in a worst-case scenario.
>
>BTW, this is a general static char* -> void* dictionary, I bet it
>could possibly have other uses. (It may also be a well-studied
>problem, though a bit hard to search for...) I suppose we could reduce
>it to read-optimized int -> int mappings.


The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no details, probably just hand-waving.

128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64 bit subset.

>From the interning/key benchmarks, checking the full 128 bits would probably not be noticeable in microbenchmarks, it's more about using an extra register and bloating the instruction cache and data cache a bit etc, stuff that can only be measured in production.

The alternative is having a collision detection registry. If it complains, you're told where to edit Cython (perhaps a datafile) so that the pre-hash function changes:

if signature equals 'foo:ddffi'
   # known collision with 'bar:ii'
    Use high 64 bits of md5
Else:
   Use low 64 bits of md5

Each such collision is documented in the cep/sep.

But 128 bit and then relying on luck is perhaps simpler...

If we need flags, lets say that 92 bits suffice. for hash and use 16 for flags...

But i was thinking that you'd have separate tables for nogil callers and gil-holding callers so that you didn't need to scan for matching flags. We really want this to be branch-miss-free. Still, flags are good for error return codes etc.

Do you agree on forgetting about the encoded keys/interning even for SEP 201? There's only so much effort to go around and I'd much rather use md5 and these hash tables everywhere.

Dag

>
>- Robert
>_______________________________________________
>cython-devel mailing list
>cython-devel at python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

From robertwb at gmail.com  Tue Jun  5 00:30:38 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Mon, 4 Jun 2012 15:30:38 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
Message-ID: <CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>

On Mon, Jun 4, 2012 at 3:07 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
>
>
> Robert Bradshaw <robertwb at gmail.com> wrote:
>
>>On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
>><d.s.seljebotn at astro.uio.no> wrote:
>>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>>> already, but I figured it was better to continue it and provide more
>>>> in-depth benchmark results here.
>>>>
>>>> It's basically a new idea of how to provide a vtable based on
>>perfect
>>>> hashing, which should be a lot simpler to implement than what I
>>first
>>>> imagined.
>>>>
>>>> I'll write down some context first, if you're familiar with this
>>>> skip ahead a bit..
>>>>
>>>> This means that you can do fast dispatches *without* the messy
>>>> business of binding vtable slots at compile time. To be concrete,
>>this
>>>> might e.g. take the form
>>>>
>>>> def f(obj):
>>>> obj.method(3.4) # try to find a vtable with "void method(double)" in
>>it
>>>>
>>>> or, a more typed approach,
>>>>
>>>> # File A
>>>> cdef class MyImpl:
>>>> def double method(double x): return x * x
>>>>
>>>> # File B
>>>> # Here we never know about MyImpl, hence "duck-typed"
>>>> @cython.interface
>>>> class MyIntf:
>>>> def double method(double x): pass
>>>>
>>>> def f(MyIntf obj):
>>>> # obj *can* be MyImpl instance, or whatever else that supports
>>>> # that interface
>>>> obj.method(3.4)
>>>>
>>>>
>>>> Now, the idea to implement this is:
>>>>
>>>> a) Both caller and callee pre-hash name/argument string
>>>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>>>> md5)
>>>>
>>>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>>>> hashing. What you do is define a final hash fh as a function
>>>> of the pre-hash ph, for instance
>>>>
>>>> fh = ((ph >> vtable.r1) ^ (ph >> vtable.r2) ^ (ph >> vtable.r3)) &
>>>> vtable.m
>>>>
>>>> (Me and Robert are benchmarking different functions to use here.) By
>>>> playing with r1, r2, r3, you have 64**3 choices of hash function,
>>and
>>>> will be able to pick a combination which gives *no* (or very few)
>>>> collisions.
>>>>
>>>> c) Caller then combines the pre-hash generated at compile-time, with
>>>> r1, r2, r3, m stored in the vtable header, in order to find the
>>>> final location in the hash-table.
>>>>
>>>> The exciting thing is that in benchmark, the performance penalty is
>>>> actually very slight over a C++-style v-table. (Of course you can
>>>> cache a proper vtable, but the fact that you get so close without
>>>> caring about caching means that this can be done much faster.)
>>
>>One advantage about caching a vtable is that one can possibly put in
>>adapters for non-exact matches. It also opens up the possibility of
>>putting in stubs to call def methods if they exist. This needs to be
>>fleshed out more, (another CEP :) but could provide for a
>>backwards-compatible easy first implementation.
>>
>>>> Back to my and Robert's discussion on benchmarks:
>>>>
>>>> I've uploaded benchmarks here:
>>>>
>>>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>>>
>>>> I've changed the benchmark taking to give more robust numbers (at
>>>> least for me), you want to look at the 'min' column.
>>>>
>>>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>>>> So we don't pass 'h' on the stack, but either a) looks it up in a
>>global
>>>> variable (default), or b) it's a compile-time constant (immediate in
>>>> assembly) (compile with -DIMHASH).
>>>>
>>>> Similarly, the ID is either an "interned" global variable, or an
>>>> immediate (-DIMID).
>>>>
>>>> The results are very different on my machine depending on this
>>aspect.
>>>> My conclusions:
>>>>
>>>> - Both three shifts with masking, two shifts with a "fallback slot"
>>>> (allowing for a single collision), three shifts, two shifts with
>>>> two masks allows for constructing good vtables. In the case of only
>>>> two shifts, one colliding method gets the twoshift+fback
>>>> performance and the rest gets the twoshift performance.
>>>>
>>>> - Performance is really more affected by whether hashes are
>>>> immediates or global variables than the hash function. This is in
>>>> contrast to the interning vs. key benchmarks -- so I think that if
>>>> we looked up the vtable through PyTypeObject, rather than getting
>>>> the vtable directly, the loads of the global variables could
>>>> potentially be masked by that.
>>>>
>>>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>>>> and the ID-ing (don't bother with any interning), and compile the
>>>> thing as a 64-bit immediate. This can cause crashes/stack smashes
>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>> probability is incredibly small, b) it would only matter in
>>>> situations that should cause an AttributeError anyway, c) if we
>>>> really care, we can always use an interning-like mechanism to
>>>> validate on module loading that its hashes doesn't collide with
>>>> other hashes (and raise an exception "Congratulations, you've
>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>> devs and we'll work around it right away").
>>
>>Due to the birthday paradox, this seems a bit risky. Maybe it's
>>because I regularly work with collections much bigger than 2^32, and I
>>suppose we're talking about unique method names and signatures here,
>>but still... I wonder what the penalty would be for checking the full
>>128 bit hash. (Storing it could allow for greater entropy in the
>>optimal hash table search as well).
>>
>>> What I forgot to mention:
>>>
>>> ?- I really want to avoid linear probing just because of the code
>>bloat in
>>> call sites.
>>
>>That's a good point. What about flags--are we throwing out the idea of
>>masking?
>>
>>> With two shifts, when there was a failure to find a perfect hash
>>> it was always possible to find one with a single collision.
>>>
>>> ?- Probing for the hash with two shifts is lightning fast, it can
>>take a
>>> while with three shifts (though you can always spend more memory on a
>>bigger
>>> table to make it fast again). However, it makes me uneasy to penalize
>>the
>>> performance of calling one of the random methods, so I'm really in
>>favour of
>>> three-shifts or double-mask (to be decided when investigating the
>>> performance of probing for parameters in more detail).
>>>
>>> ?- I tried using SSE to do shifts in parallel and failed (miserable
>>> performance). The problem is quickly moving things between general
>>purpose
>>> registers and SSE registers, and the lack of SSE immediates/constants
>>in the
>>> instruction stream. At least, what my gcc 4.6 generates appeared to
>>use the
>>> stack to communicate between SSE registers and general purpose
>>registers
>>> (but I can't have been doing the right thing..).
>>>
>>>
>>>
>>>>
>>>> The RTTI (i.e. the char*) is also put in there, but is not used for
>>>> comparison and is not interned.
>>>>
>>>> At least, that's what I think we should do for duck-style vtables.
>>>>
>>>> Do we then go to all the pain of defining key-encoding, interning
>>>> etc. just for SEP 201? Isn't it easier to just mandate a md5
>>dependency
>>>> and be done with it? (After all, md5 usually comes with Python in
>>the
>>>> md5 and hashlib modules)
>>>>
>>>> direct: Early-binding
>>>> index: Call slot 0 (C++-style vtable/function pointer)
>>>> noshift: h & m1
>>>> oneshift: (h >> r1) & m1
>>>> twoshift: ((h >> r1) ^ (h >> r2)) & m1
>>>> twoshift+fback: hash doesn't
>>>
>>>
>>> I meant: Hash collision and then, after a branch miss, look up the
>>one
>>> fallback slot in the vtable header.
>>
>>We could also do a fallback table. Usually it'd be empty, Occasionally
>>it'd have one element in it. It'd always be possible to make this big
>>enough to avoid collisions in a worst-case scenario.
>>
>>BTW, this is a general static char* -> void* dictionary, I bet it
>>could possibly have other uses. (It may also be a well-studied
>>problem, though a bit hard to search for...) I suppose we could reduce
>>it to read-optimized int -> int mappings.
>
>
> The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no details, probably just hand-waving.

I just found http://cmph.sourceforge.net/ which looks quite
interesting. Though the resulting hash functions are supposedly cheap,
I have the feeling that branching is considered cheap in this context.

> 128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64 bit subset.

True. I don't have a good way to quantify the correlation between
different shifts of the same value (vs. truly random values) but it
didn't seem to be very significant in the experiments.

> From the interning/key benchmarks, checking the full 128 bits would probably not be noticeable in microbenchmarks, it's more about using an extra register and bloating the instruction cache and data cache a bit etc, stuff that can only be measured in production.

One could make the check optionally omitted at compile time. It would
still bloat the table, but not by much (or at all if we share with
flag bits as suggested below).

> The alternative is having a collision detection registry. If it complains, you're told where to edit Cython (perhaps a datafile) so that the pre-hash function changes:
>
> if signature equals 'foo:ddffi'
> ? # known collision with 'bar:ii'
> ? ?Use high 64 bits of md5
> Else:
> ? Use low 64 bits of md5
>
> Each such collision is documented in the cep/sep.
>
> But 128 bit and then relying on luck is perhaps simpler...

Much.

> If we need flags, lets say that 92 bits suffice. for hash and use 16 for flags...
>
> But i was thinking that you'd have separate tables for nogil callers and gil-holding callers so that you didn't need to scan for matching flags. We really want this to be branch-miss-free. Still, flags are good for error return codes etc.

Duplicate tables works as long as there aren't too many orthogonal
considerations. Is the GIL the only one? What about "I can propagate
errors?" Now we're up to 4 tables...

> Do you agree on forgetting about the encoded keys/interning even for SEP 201? There's only so much effort to go around and I'd much rather use md5 and these hash tables everywhere.

Yes, for sure!

- Robert

From stefan_ml at behnel.de  Tue Jun  5 09:25:44 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 05 Jun 2012 09:25:44 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCD100B.7000008@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no>
Message-ID: <4FCDB478.3070000@behnel.de>

Dag Sverre Seljebotn, 04.06.2012 21:44:
>    This can cause crashes/stack smashes
>    etc. if there's lower-64bit-of-md5 collisions, but a) the
>    probability is incredibly small, b) it would only matter in
>    situations that should cause an AttributeError anyway, c) if we
>    really care, we can always use an interning-like mechanism to
>    validate on module loading that its hashes doesn't collide with
>    other hashes (and raise an exception "Congratulations, you've
>    discovered a phenomenal md5 collision, get in touch with cython
>    devs and we'll work around it right away").

I'm not a big fan of such an attitude. If this happens at runtime, it can
induce any cost from cheap-at-test-time to hugely-expensive-in-production.
Thinking with my evil hat on, this can potentially be data triggered from
the outside (e.g. if a JIT compiler is involved at one end), thus possibly
even leading to a security hole.

We should try to produce software that others can build a business on.

Stefan

From stefan_ml at behnel.de  Tue Jun  5 10:07:10 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 05 Jun 2012 10:07:10 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
Message-ID: <4FCDBE2E.2070205@behnel.de>

Dag Sverre Seljebotn, 05.06.2012 00:07:
> The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no details, probably just hand-waving.
> 
> 128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64 bit subset.

Perfect hashing can be done with any fixed size data set (although it's not
guaranteed to always be the most efficient solution). It doesn't matter if
you use 64 bits or 128 bits. If 4 bits is enough, go with that. The
advantage of perfect hashing of a fixed size data set is that the hash
table has no free space and a match is guaranteed to be exact.

However, the problem in this specific case is that the caller and the
callee do not agree on the same set of entries, so there may be collisions
during the lookup (of a potentially very large set of signatures) that were
not anticipated in the perfect hash table layout (of the much smaller set
of provided signatures). Perfect hashing works here as well, but it looses
one of its main advantage over other hashing schemes. You then have to
compare the entries exactly after the lookup in order to make sure that you
didn't run into a collision, thus loosing time again that you just won with
the hashing.

But at least you only have to do exactly one such comparison, so that's an
advantage over a hashing scheme that allows collisions also in the layout.
Maybe you can even handle mismatches more quickly by adding a dedicated
"empty" entry for them that most (all?) anticipated mismatches would hash to.

Stefan

From robertwb at gmail.com  Tue Jun  5 11:16:34 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Tue, 5 Jun 2012 02:16:34 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCDBE2E.2070205@behnel.de>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<4FCDBE2E.2070205@behnel.de>
Message-ID: <CADiQ+QB4vOG-m9E-mZOt9ByfMcti1in4PRvgUKCs7r=KEJH+iA@mail.gmail.com>

On Tue, Jun 5, 2012 at 1:07 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Dag Sverre Seljebotn, 05.06.2012 00:07:
>> The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no details, probably just hand-waving.
>>
>> 128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64 bit subset.
>
> Perfect hashing can be done with any fixed size data set (although it's not
> guaranteed to always be the most efficient solution). It doesn't matter if
> you use 64 bits or 128 bits. If 4 bits is enough, go with that. The
> advantage of perfect hashing of a fixed size data set is that the hash
> table has no free space and a match is guaranteed to be exact.

The hash function is f(h(sig)) where f is parameterized but must be
*extremely* cheap and h is fixed without regard to the entry set. This
is why having 128 bits for the output of h may be an advantage.

> However, the problem in this specific case is that the caller and the
> callee do not agree on the same set of entries, so there may be collisions
> during the lookup (of a potentially very large set of signatures) that were
> not anticipated in the perfect hash table layout (of the much smaller set
> of provided signatures). Perfect hashing works here as well, but it looses
> one of its main advantage over other hashing schemes. You then have to
> compare the entries exactly after the lookup in order to make sure that you
> didn't run into a collision, thus loosing time again that you just won with
> the hashing.
>
> But at least you only have to do exactly one such comparison, so that's an
> advantage over a hashing scheme that allows collisions also in the layout.
> Maybe you can even handle mismatches more quickly by adding a dedicated
> "empty" entry for them that most (all?) anticipated mismatches would hash to.

The idea is that the comparison would be cheap, a single 128-bit
compare. The whole point is to avoid branching in the success case.

I agree with you about 64-bit collisions being too high a risk. One
could re-introduce the encoding/interning if desired, but I think
we're safe in assuming no accidental md5 collisions (but hadn't
thought much about the malicious case; if you're allowed to dictate
function pointers you'd better have another line of defense. Perhaps
this needs to be considered more.) We could even use sha1, though I
thought the previous benchmarks indicated that comparing 160 bits was
non-negligibly more expensive than comparing just 64.

- Robert

From d.s.seljebotn at astro.uio.no  Tue Jun  5 18:56:46 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 05 Jun 2012 18:56:46 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QB4vOG-m9E-mZOt9ByfMcti1in4PRvgUKCs7r=KEJH+iA@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<4FCDBE2E.2070205@behnel.de>
	<CADiQ+QB4vOG-m9E-mZOt9ByfMcti1in4PRvgUKCs7r=KEJH+iA@mail.gmail.com>
Message-ID: <4FCE3A4E.7070202@astro.uio.no>

On 06/05/2012 11:16 AM, Robert Bradshaw wrote:
> On Tue, Jun 5, 2012 at 1:07 AM, Stefan Behnel<stefan_ml at behnel.de>  wrote:
>> Dag Sverre Seljebotn, 05.06.2012 00:07:
>>> The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no details, probably just hand-waving.
>>>
>>> 128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64 bit subset.
>>
>> Perfect hashing can be done with any fixed size data set (although it's not
>> guaranteed to always be the most efficient solution). It doesn't matter if
>> you use 64 bits or 128 bits. If 4 bits is enough, go with that. The
>> advantage of perfect hashing of a fixed size data set is that the hash
>> table has no free space and a match is guaranteed to be exact.
>
> The hash function is f(h(sig)) where f is parameterized but must be
> *extremely* cheap and h is fixed without regard to the entry set. This
> is why having 128 bits for the output of h may be an advantage.
>
>> However, the problem in this specific case is that the caller and the
>> callee do not agree on the same set of entries, so there may be collisions
>> during the lookup (of a potentially very large set of signatures) that were
>> not anticipated in the perfect hash table layout (of the much smaller set
>> of provided signatures). Perfect hashing works here as well, but it looses
>> one of its main advantage over other hashing schemes. You then have to
>> compare the entries exactly after the lookup in order to make sure that you
>> didn't run into a collision, thus loosing time again that you just won with
>> the hashing.

Me and Robert spent some time on those benchmarks, please bother reading 
them before making statements like this.

There's benchmarks both with a 64-bit comparison of an interned ID and 
comparison with the compile-time 64-bit hash (faster). All my benchmarks 
included some comparison after the lookup.

Comparison is very cheap *if* it is the likely() path. Branch misses is 
what counts. Perfect hashing, even with comparison, wins you big-time in 
branch prediction.

>> But at least you only have to do exactly one such comparison, so that's an
>> advantage over a hashing scheme that allows collisions also in the layout.
>> Maybe you can even handle mismatches more quickly by adding a dedicated
>> "empty" entry for them that most (all?) anticipated mismatches would hash to.

You mean, like what I did in the twoshift+fback benchmark?

Getting a single branch miss makes it the slowest one. But all other 
methods (the ones that doesn't collide) run slightly faster than with 
three shifts.

> The idea is that the comparison would be cheap, a single 128-bit
> compare. The whole point is to avoid branching in the success case.
>
> I agree with you about 64-bit collisions being too high a risk. One
> could re-introduce the encoding/interning if desired, but I think
> we're safe in assuming no accidental md5 collisions (but hadn't
> thought much about the malicious case; if you're allowed to dictate
> function pointers you'd better have another line of defense. Perhaps
> this needs to be considered more.) We could even use sha1, though I

I fail to understand the comment on security at all. Why not just use 
the *correct* signature to feed a function that intentionally segfaults 
(or does whatever else)?

> thought the previous benchmarks indicated that comparing 160 bits was
> non-negligibly more expensive than comparing just 64.

Loading an interned ID from a global variable was certainly 
non-negligible too.

Let's do some numbers on how many bits we need here:

Ballpark estimate: Assume that 50 billion lines of Cython code will be 
written over the course of human history (that's like SAGE times 
200,000). Now assume that every 100 lines of code people write, there's 
an entirely new method declaration that's never, ever in the entire 
human history has been written before in Cython => 2**22 signatures will 
occur.

The total probability that a *single* collision (or more) will *ever* 
happen over the course of human history is:

64 bit ID: 5e-7
96 bit ID: 1e-17
128 bit ID: 3e-26
160 bit ID: 6e-36

Computed with, e.g.,:

sage: R=RealField(1000)
sage: n=R(2)**22
sage: 1 - exp(-n * (n-1) / 2 / R(2)**160)

http://en.wikipedia.org/wiki/Birthday_problem

Dag

From d.s.seljebotn at astro.uio.no  Tue Jun  5 19:01:19 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 05 Jun 2012 19:01:19 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCDB478.3070000@behnel.de>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
Message-ID: <4FCE3B5F.9080603@astro.uio.no>

On 06/05/2012 09:25 AM, Stefan Behnel wrote:
> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>     This can cause crashes/stack smashes
>>     etc. if there's lower-64bit-of-md5 collisions, but a) the
>>     probability is incredibly small, b) it would only matter in
>>     situations that should cause an AttributeError anyway, c) if we
>>     really care, we can always use an interning-like mechanism to
>>     validate on module loading that its hashes doesn't collide with
>>     other hashes (and raise an exception "Congratulations, you've
>>     discovered a phenomenal md5 collision, get in touch with cython
>>     devs and we'll work around it right away").
>
> I'm not a big fan of such an attitude. If this happens at runtime, it can
> induce any cost from cheap-at-test-time to hugely-expensive-in-production.
> Thinking with my evil hat on, this can potentially be data triggered from
> the outside (e.g. if a JIT compiler is involved at one end), thus possibly
> even leading to a security hole.
>
> We should try to produce software that others can build a business on.

Well, I'd build a business on something that fails with a 5e-7 
probability any day :-) (given that you trust my estimates in the other 
post; I think they were rather conservative myself)

But I'll do benchmarks for 96-bit and 128 bit hash comparisons as soon 
as I can get around to it.

Dag

From d.s.seljebotn at astro.uio.no  Tue Jun  5 19:09:37 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 05 Jun 2012 19:09:37 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCE3B5F.9080603@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no>
Message-ID: <4FCE3D51.20009@astro.uio.no>

On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote:
> On 06/05/2012 09:25 AM, Stefan Behnel wrote:
>> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>> This can cause crashes/stack smashes
>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>> probability is incredibly small, b) it would only matter in
>>> situations that should cause an AttributeError anyway, c) if we
>>> really care, we can always use an interning-like mechanism to
>>> validate on module loading that its hashes doesn't collide with
>>> other hashes (and raise an exception "Congratulations, you've
>>> discovered a phenomenal md5 collision, get in touch with cython
>>> devs and we'll work around it right away").
>>
>> I'm not a big fan of such an attitude. If this happens at runtime, it can
>> induce any cost from cheap-at-test-time to
>> hugely-expensive-in-production.
>> Thinking with my evil hat on, this can potentially be data triggered from
>> the outside (e.g. if a JIT compiler is involved at one end), thus
>> possibly
>> even leading to a security hole.
>>
>> We should try to produce software that others can build a business on.
>
> Well, I'd build a business on something that fails with a 5e-7
> probability any day :-) (given that you trust my estimates in the other
> post; I think they were rather conservative myself)

This was put the wrong way. The chance was 5e-7 that it would fail for 
anybody over the course of human history (and that was a rather 
pessimistic estimate).

So a more "individual tack":

Assume that the process contains 200 MB of method definitions alone, 
with each method definition being a 8 character string. (That should 
mean the executable should be several gigabytes :-))

That puts the probability of collision at 10^-34 for that process 
containing a 64-bit hash collision.

Dag

From markflorisson88 at gmail.com  Tue Jun  5 20:02:04 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 5 Jun 2012 19:02:04 +0100
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCE3D51.20009@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no> <4FCE3D51.20009@astro.uio.no>
Message-ID: <CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>

On 5 June 2012 18:09, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote:
>>
>> On 06/05/2012 09:25 AM, Stefan Behnel wrote:
>>>
>>> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>>>
>>>> This can cause crashes/stack smashes
>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>> probability is incredibly small, b) it would only matter in
>>>> situations that should cause an AttributeError anyway, c) if we
>>>> really care, we can always use an interning-like mechanism to
>>>> validate on module loading that its hashes doesn't collide with
>>>> other hashes (and raise an exception "Congratulations, you've
>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>> devs and we'll work around it right away").
>>>
>>>
>>> I'm not a big fan of such an attitude. If this happens at runtime, it can
>>> induce any cost from cheap-at-test-time to
>>> hugely-expensive-in-production.
>>> Thinking with my evil hat on, this can potentially be data triggered from
>>> the outside (e.g. if a JIT compiler is involved at one end), thus
>>> possibly
>>> even leading to a security hole.
>>>
>>> We should try to produce software that others can build a business on.
>>
>>
>> Well, I'd build a business on something that fails with a 5e-7
>> probability any day :-) (given that you trust my estimates in the other
>> post; I think they were rather conservative myself)
>
>
> This was put the wrong way. The chance was 5e-7 that it would fail for
> anybody over the course of human history (and that was a rather pessimistic
> estimate).
>
> So a more "individual tack":
>
> Assume that the process contains 200 MB of method definitions alone, with
> each method definition being a 8 character string. (That should mean the
> executable should be several gigabytes :-))
>
> That puts the probability of collision at 10^-34 for that process containing
> a 64-bit hash collision.
>
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

The point is not so much running into this problem accidentally, but
maliciously. If user input from untrusted users can somehow determine
the function signatures that are generated and called by a JIT, then a
malicious user can find collisions offline and cause some fault in a
valid user program.

From d.s.seljebotn at astro.uio.no  Tue Jun  5 21:33:16 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 05 Jun 2012 21:33:16 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no> <4FCE3D51.20009@astro.uio.no>
	<CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>
Message-ID: <4FCE5EFC.30407@astro.uio.no>

On 06/05/2012 08:02 PM, mark florisson wrote:
> On 5 June 2012 18:09, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 06/05/2012 09:25 AM, Stefan Behnel wrote:
>>>>
>>>> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>>>>
>>>>> This can cause crashes/stack smashes
>>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>>> probability is incredibly small, b) it would only matter in
>>>>> situations that should cause an AttributeError anyway, c) if we
>>>>> really care, we can always use an interning-like mechanism to
>>>>> validate on module loading that its hashes doesn't collide with
>>>>> other hashes (and raise an exception "Congratulations, you've
>>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>>> devs and we'll work around it right away").
>>>>
>>>>
>>>> I'm not a big fan of such an attitude. If this happens at runtime, it can
>>>> induce any cost from cheap-at-test-time to
>>>> hugely-expensive-in-production.
>>>> Thinking with my evil hat on, this can potentially be data triggered from
>>>> the outside (e.g. if a JIT compiler is involved at one end), thus
>>>> possibly
>>>> even leading to a security hole.
>>>>
>>>> We should try to produce software that others can build a business on.
>>>
>>>
>>> Well, I'd build a business on something that fails with a 5e-7
>>> probability any day :-) (given that you trust my estimates in the other
>>> post; I think they were rather conservative myself)
>>
>>
>> This was put the wrong way. The chance was 5e-7 that it would fail for
>> anybody over the course of human history (and that was a rather pessimistic
>> estimate).
>>
>> So a more "individual tack":
>>
>> Assume that the process contains 200 MB of method definitions alone, with
>> each method definition being a 8 character string. (That should mean the
>> executable should be several gigabytes :-))
>>
>> That puts the probability of collision at 10^-34 for that process containing
>> a 64-bit hash collision.
>>
>>
>> Dag
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>
> The point is not so much running into this problem accidentally, but
> maliciously. If user input from untrusted users can somehow determine
> the function signatures that are generated and called by a JIT, then a
> malicious user can find collisions offline and cause some fault in a
> valid user program.

This took me a while to understand. So the idea is that you're in a 
completely managed environment (like Java), and you want to run 
untrusted code and have it not segfault or smash the stack. Eve then 
cleverly assembles a caller/callee pair with mismatching signatures but 
the same hash.

Yes, in that situation 64 bits is perhaps not enough.

But is this relevant to what we're trying to do here? We're discussing 
APIs to talk between Python C extension modules that already have 
unlimited powers. I'd think a "managed Cython" would be such a large 
change that one could easily change the hash size at that point?

But I agree it's not as easily written off as I thought.

Dag

From d.s.seljebotn at astro.uio.no  Tue Jun  5 22:10:04 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 05 Jun 2012 22:10:04 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
Message-ID: <4FCE679C.7000002@astro.uio.no>

On 06/04/2012 11:43 PM, Robert Bradshaw wrote:
> On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>
>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>> already, but I figured it was better to continue it and provide more
>>> in-depth benchmark results here.
>>>
>>> It's basically a new idea of how to provide a vtable based on perfect
>>> hashing, which should be a lot simpler to implement than what I first
>>> imagined.
>>>
>>> I'll write down some context first, if you're familiar with this
>>> skip ahead a bit..
>>>
>>> This means that you can do fast dispatches *without* the messy
>>> business of binding vtable slots at compile time. To be concrete, this
>>> might e.g. take the form
>>>
>>> def f(obj):
>>> obj.method(3.4) # try to find a vtable with "void method(double)" in it
>>>
>>> or, a more typed approach,
>>>
>>> # File A
>>> cdef class MyImpl:
>>> def double method(double x): return x * x
>>>
>>> # File B
>>> # Here we never know about MyImpl, hence "duck-typed"
>>> @cython.interface
>>> class MyIntf:
>>> def double method(double x): pass
>>>
>>> def f(MyIntf obj):
>>> # obj *can* be MyImpl instance, or whatever else that supports
>>> # that interface
>>> obj.method(3.4)
>>>
>>>
>>> Now, the idea to implement this is:
>>>
>>> a) Both caller and callee pre-hash name/argument string
>>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>>> md5)
>>>
>>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>>> hashing. What you do is define a final hash fh as a function
>>> of the pre-hash ph, for instance
>>>
>>> fh = ((ph>>  vtable.r1) ^ (ph>>  vtable.r2) ^ (ph>>  vtable.r3))&
>>> vtable.m
>>>
>>> (Me and Robert are benchmarking different functions to use here.) By
>>> playing with r1, r2, r3, you have 64**3 choices of hash function, and
>>> will be able to pick a combination which gives *no* (or very few)
>>> collisions.
>>>
>>> c) Caller then combines the pre-hash generated at compile-time, with
>>> r1, r2, r3, m stored in the vtable header, in order to find the
>>> final location in the hash-table.
>>>
>>> The exciting thing is that in benchmark, the performance penalty is
>>> actually very slight over a C++-style v-table. (Of course you can
>>> cache a proper vtable, but the fact that you get so close without
>>> caring about caching means that this can be done much faster.)
>
> One advantage about caching a vtable is that one can possibly put in
> adapters for non-exact matches. It also opens up the possibility of
> putting in stubs to call def methods if they exist. This needs to be
> fleshed out more, (another CEP :) but could provide for a
> backwards-compatible easy first implementation.
>
>>> Back to my and Robert's discussion on benchmarks:
>>>
>>> I've uploaded benchmarks here:
>>>
>>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>>
>>> I've changed the benchmark taking to give more robust numbers (at
>>> least for me), you want to look at the 'min' column.
>>>
>>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>>> So we don't pass 'h' on the stack, but either a) looks it up in a global
>>> variable (default), or b) it's a compile-time constant (immediate in
>>> assembly) (compile with -DIMHASH).
>>>
>>> Similarly, the ID is either an "interned" global variable, or an
>>> immediate (-DIMID).
>>>
>>> The results are very different on my machine depending on this aspect.
>>> My conclusions:
>>>
>>> - Both three shifts with masking, two shifts with a "fallback slot"
>>> (allowing for a single collision), three shifts, two shifts with
>>> two masks allows for constructing good vtables. In the case of only
>>> two shifts, one colliding method gets the twoshift+fback
>>> performance and the rest gets the twoshift performance.
>>>
>>> - Performance is really more affected by whether hashes are
>>> immediates or global variables than the hash function. This is in
>>> contrast to the interning vs. key benchmarks -- so I think that if
>>> we looked up the vtable through PyTypeObject, rather than getting
>>> the vtable directly, the loads of the global variables could
>>> potentially be masked by that.
>>>
>>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>>> and the ID-ing (don't bother with any interning), and compile the
>>> thing as a 64-bit immediate. This can cause crashes/stack smashes
>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>> probability is incredibly small, b) it would only matter in
>>> situations that should cause an AttributeError anyway, c) if we
>>> really care, we can always use an interning-like mechanism to
>>> validate on module loading that its hashes doesn't collide with
>>> other hashes (and raise an exception "Congratulations, you've
>>> discovered a phenomenal md5 collision, get in touch with cython
>>> devs and we'll work around it right away").
>
> Due to the birthday paradox, this seems a bit risky. Maybe it's
> because I regularly work with collections much bigger than 2^32, and I
> suppose we're talking about unique method names and signatures here,
> but still... I wonder what the penalty would be for checking the full
> 128 bit hash. (Storing it could allow for greater entropy in the
> optimal hash table search as well).

Wonder no more. Here's the penalty for different bit-lengths, all 
compile-time constants:

      threeshift: min=6.08e-09  mean=6.11e-09  std=2.81e-11 
val=1200000000.000000
    threeshift96: min=7.53e-09  mean=7.55e-09  std=1.96e-11 
val=1200000000.000000
   threeshift128: min=6.95e-09  mean=6.97e-09  std=2.57e-11 
val=1200000000.000000
   threeshift160: min=8.17e-09  mean=8.23e-09  std=4.06e-11 
val=1200000000.000000

And for comparison, when loading the comparison IDs from global variable:

      threeshift: min=6.46e-09  mean=6.52e-09  std=4.95e-11 
val=1200000000.000000
    threeshift96: min=8.07e-09  mean=8.16e-09  std=4.55e-11 
val=1200000000.000000
   threeshift128: min=8.06e-09  mean=8.18e-09  std=6.71e-11 
val=1200000000.000000
   threeshift160: min=9.71e-09  mean=9.83e-09  std=5.12e-11 
val=1200000000.000000

So indeed,

64-bit hash < interning < 128 bit hash

(At least on my Intel Nehalem Core i7 1.87GhZ)

And the load of the global variable may in real life be hidden by other 
things going on in the function.

And, you save vtable memory by having an interned char* and not saving 
the hash in the vtable.

They should be made more easily runnable so that we could run them on 
various systems, but it makes sense to first read up on and figure out 
which hash functions are really viable, to keep the number of numbers down.

I just realized that I never pushed the changes I did to introduce 
-DIMHASH/-DIMID etc., but the benchmarks are pushed now.


> We could also do a fallback table. Usually it'd be empty, Occasionally
> it'd have one element in it. It'd always be possible to make this big
> enough to avoid collisions in a worst-case scenario.

If you do a fallback table it's as much code in the call site as linear 
probing...

But when I played with the generation side, a failure to create a table 
at a given size would *always* be due to a single collision. This is 
what I did in the twoshift+fback benchmark.

 > Duplicate tables works as long as there aren't too many orthogonal
 > considerations. Is the GIL the only one? What about "I can propagate
 > errors?" Now we're up to 4 tables...

Would your decision of whether or not to dispatch to a function depend 
on whether or not it propagates errors?

I'm thinking of the "with gil" function case, i.e. callee has:

  a) Function to call if you have the GIL
  b) GIL-acquiring wrapper

and you want GIL-holding code to call a) and nogil code to call b).

But one could just make the caller acquire the GIL if needed (which in 
that case is so expensive anyway that it can be made the unlikely() path).

I can't think of other situations where you would pick which function to 
call based on flags.

Dag

From markflorisson88 at gmail.com  Tue Jun  5 22:33:12 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 5 Jun 2012 21:33:12 +0100
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCE5EFC.30407@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no> <4FCE3D51.20009@astro.uio.no>
	<CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>
	<4FCE5EFC.30407@astro.uio.no>
Message-ID: <CANg26EUZaRjEVem54TxbQ9iExgBKrVvG0R68J2hO4-NH3GaSFQ@mail.gmail.com>

On 5 June 2012 20:33, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 06/05/2012 08:02 PM, mark florisson wrote:
>>
>> On 5 June 2012 18:09, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>> ?wrote:
>>>
>>> On 06/05/2012 07:01 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>>
>>>> On 06/05/2012 09:25 AM, Stefan Behnel wrote:
>>>>>
>>>>>
>>>>> Dag Sverre Seljebotn, 04.06.2012 21:44:
>>>>>>
>>>>>>
>>>>>> This can cause crashes/stack smashes
>>>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>>>> probability is incredibly small, b) it would only matter in
>>>>>> situations that should cause an AttributeError anyway, c) if we
>>>>>> really care, we can always use an interning-like mechanism to
>>>>>> validate on module loading that its hashes doesn't collide with
>>>>>> other hashes (and raise an exception "Congratulations, you've
>>>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>>>> devs and we'll work around it right away").
>>>>>
>>>>>
>>>>>
>>>>> I'm not a big fan of such an attitude. If this happens at runtime, it
>>>>> can
>>>>> induce any cost from cheap-at-test-time to
>>>>> hugely-expensive-in-production.
>>>>> Thinking with my evil hat on, this can potentially be data triggered
>>>>> from
>>>>> the outside (e.g. if a JIT compiler is involved at one end), thus
>>>>> possibly
>>>>> even leading to a security hole.
>>>>>
>>>>> We should try to produce software that others can build a business on.
>>>>
>>>>
>>>>
>>>> Well, I'd build a business on something that fails with a 5e-7
>>>> probability any day :-) (given that you trust my estimates in the other
>>>> post; I think they were rather conservative myself)
>>>
>>>
>>>
>>> This was put the wrong way. The chance was 5e-7 that it would fail for
>>> anybody over the course of human history (and that was a rather
>>> pessimistic
>>> estimate).
>>>
>>> So a more "individual tack":
>>>
>>> Assume that the process contains 200 MB of method definitions alone, with
>>> each method definition being a 8 character string. (That should mean the
>>> executable should be several gigabytes :-))
>>>
>>> That puts the probability of collision at 10^-34 for that process
>>> containing
>>> a 64-bit hash collision.
>>>
>>>
>>> Dag
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>>
>> The point is not so much running into this problem accidentally, but
>> maliciously. If user input from untrusted users can somehow determine
>> the function signatures that are generated and called by a JIT, then a
>> malicious user can find collisions offline and cause some fault in a
>> valid user program.
>
>
> This took me a while to understand. So the idea is that you're in a
> completely managed environment (like Java), and you want to run untrusted
> code and have it not segfault or smash the stack. Eve then cleverly
> assembles a caller/callee pair with mismatching signatures but the same
> hash.
>
> Yes, in that situation 64 bits is perhaps not enough.
>
> But is this relevant to what we're trying to do here? We're discussing APIs
> to talk between Python C extension modules that already have unlimited
> powers. I'd think a "managed Cython" would be such a large change that one
> could easily change the hash size at that point?
>
> But I agree it's not as easily written off as I thought.
>
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

It doesn't even necessarily have to be about running user code, a user
could craft data input which causes such a situation. For instance,
let's say we have a just-in-time specializer which specializes a
function for the runtime input types, and the types depend on the user
input. For instance, if we write a web application we can post arrays
to described by a custom dtype, which draws pictures in some weird way
for us. We can get it to specialize pretty much any array type, so
that gives us a good opportunity to find collisions.

From robertwb at gmail.com  Tue Jun  5 22:50:23 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Tue, 5 Jun 2012 13:50:23 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCE679C.7000002@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<4FCE679C.7000002@astro.uio.no>
Message-ID: <CADiQ+QA+3xmoK-K6X44jyLFyUadgSZV45d4b68bmJ=UeLjmgmg@mail.gmail.com>

On Tue, Jun 5, 2012 at 1:10 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/04/2012 11:43 PM, Robert Bradshaw wrote:
>>
>> On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>>
>>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>>> already, but I figured it was better to continue it and provide more
>>>> in-depth benchmark results here.
>>>>
>>>> It's basically a new idea of how to provide a vtable based on perfect
>>>> hashing, which should be a lot simpler to implement than what I first
>>>> imagined.
>>>>
>>>> I'll write down some context first, if you're familiar with this
>>>> skip ahead a bit..
>>>>
>>>> This means that you can do fast dispatches *without* the messy
>>>> business of binding vtable slots at compile time. To be concrete, this
>>>> might e.g. take the form
>>>>
>>>> def f(obj):
>>>> obj.method(3.4) # try to find a vtable with "void method(double)" in it
>>>>
>>>> or, a more typed approach,
>>>>
>>>> # File A
>>>> cdef class MyImpl:
>>>> def double method(double x): return x * x
>>>>
>>>> # File B
>>>> # Here we never know about MyImpl, hence "duck-typed"
>>>> @cython.interface
>>>> class MyIntf:
>>>> def double method(double x): pass
>>>>
>>>> def f(MyIntf obj):
>>>> # obj *can* be MyImpl instance, or whatever else that supports
>>>> # that interface
>>>> obj.method(3.4)
>>>>
>>>>
>>>> Now, the idea to implement this is:
>>>>
>>>> a) Both caller and callee pre-hash name/argument string
>>>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>>>> md5)
>>>>
>>>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>>>> hashing. What you do is define a final hash fh as a function
>>>> of the pre-hash ph, for instance
>>>>
>>>> fh = ((ph>> ?vtable.r1) ^ (ph>> ?vtable.r2) ^ (ph>> ?vtable.r3))&
>>>> vtable.m
>>>>
>>>> (Me and Robert are benchmarking different functions to use here.) By
>>>> playing with r1, r2, r3, you have 64**3 choices of hash function, and
>>>> will be able to pick a combination which gives *no* (or very few)
>>>> collisions.
>>>>
>>>> c) Caller then combines the pre-hash generated at compile-time, with
>>>> r1, r2, r3, m stored in the vtable header, in order to find the
>>>> final location in the hash-table.
>>>>
>>>> The exciting thing is that in benchmark, the performance penalty is
>>>> actually very slight over a C++-style v-table. (Of course you can
>>>> cache a proper vtable, but the fact that you get so close without
>>>> caring about caching means that this can be done much faster.)
>>
>>
>> One advantage about caching a vtable is that one can possibly put in
>> adapters for non-exact matches. It also opens up the possibility of
>> putting in stubs to call def methods if they exist. This needs to be
>> fleshed out more, (another CEP :) but could provide for a
>> backwards-compatible easy first implementation.
>>
>>>> Back to my and Robert's discussion on benchmarks:
>>>>
>>>> I've uploaded benchmarks here:
>>>>
>>>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>>>
>>>> I've changed the benchmark taking to give more robust numbers (at
>>>> least for me), you want to look at the 'min' column.
>>>>
>>>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>>>> So we don't pass 'h' on the stack, but either a) looks it up in a global
>>>> variable (default), or b) it's a compile-time constant (immediate in
>>>> assembly) (compile with -DIMHASH).
>>>>
>>>> Similarly, the ID is either an "interned" global variable, or an
>>>> immediate (-DIMID).
>>>>
>>>> The results are very different on my machine depending on this aspect.
>>>> My conclusions:
>>>>
>>>> - Both three shifts with masking, two shifts with a "fallback slot"
>>>> (allowing for a single collision), three shifts, two shifts with
>>>> two masks allows for constructing good vtables. In the case of only
>>>> two shifts, one colliding method gets the twoshift+fback
>>>> performance and the rest gets the twoshift performance.
>>>>
>>>> - Performance is really more affected by whether hashes are
>>>> immediates or global variables than the hash function. This is in
>>>> contrast to the interning vs. key benchmarks -- so I think that if
>>>> we looked up the vtable through PyTypeObject, rather than getting
>>>> the vtable directly, the loads of the global variables could
>>>> potentially be masked by that.
>>>>
>>>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>>>> and the ID-ing (don't bother with any interning), and compile the
>>>> thing as a 64-bit immediate. This can cause crashes/stack smashes
>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>> probability is incredibly small, b) it would only matter in
>>>> situations that should cause an AttributeError anyway, c) if we
>>>> really care, we can always use an interning-like mechanism to
>>>> validate on module loading that its hashes doesn't collide with
>>>> other hashes (and raise an exception "Congratulations, you've
>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>> devs and we'll work around it right away").
>>
>>
>> Due to the birthday paradox, this seems a bit risky. Maybe it's
>> because I regularly work with collections much bigger than 2^32, and I
>> suppose we're talking about unique method names and signatures here,
>> but still... I wonder what the penalty would be for checking the full
>> 128 bit hash. (Storing it could allow for greater entropy in the
>> optimal hash table search as well).
>
>
> Wonder no more. Here's the penalty for different bit-lengths, all
> compile-time constants:
>
> ? ? threeshift: min=6.08e-09 ?mean=6.11e-09 ?std=2.81e-11
> val=1200000000.000000
> ? threeshift96: min=7.53e-09 ?mean=7.55e-09 ?std=1.96e-11
> val=1200000000.000000
> ?threeshift128: min=6.95e-09 ?mean=6.97e-09 ?std=2.57e-11
> val=1200000000.000000
> ?threeshift160: min=8.17e-09 ?mean=8.23e-09 ?std=4.06e-11
> val=1200000000.000000
>
> And for comparison, when loading the comparison IDs from global variable:
>
> ? ? threeshift: min=6.46e-09 ?mean=6.52e-09 ?std=4.95e-11
> val=1200000000.000000
> ? threeshift96: min=8.07e-09 ?mean=8.16e-09 ?std=4.55e-11
> val=1200000000.000000
> ?threeshift128: min=8.06e-09 ?mean=8.18e-09 ?std=6.71e-11
> val=1200000000.000000
> ?threeshift160: min=9.71e-09 ?mean=9.83e-09 ?std=5.12e-11
> val=1200000000.000000
>
> So indeed,
>
> 64-bit hash < interning < 128 bit hash
>
> (At least on my Intel Nehalem Core i7 1.87GhZ)
>
> And the load of the global variable may in real life be hidden by other
> things going on in the function.
>
> And, you save vtable memory by having an interned char* and not saving the
> hash in the vtable.

I'm OK with using the 64-bit hash with a macro to enable further
checking. If it becomes an issue, we can partition the vtable into two
separate structures (hash64/pointer/flags? + hash160/char*/metadata).
That's probably overkill. With an eye to security, perhaps the spec
should be sha1 (or sha2?, not sure if that ships with Python).

> They should be made more easily runnable so that we could run them on
> various systems, but it makes sense to first read up on and figure out which
> hash functions are really viable, to keep the number of numbers down.
>
> I just realized that I never pushed the changes I did to introduce
> -DIMHASH/-DIMID etc., but the benchmarks are pushed now.
>
>
>
>> We could also do a fallback table. Usually it'd be empty, Occasionally
>> it'd have one element in it. It'd always be possible to make this big
>> enough to avoid collisions in a worst-case scenario.
>
>
> If you do a fallback table it's as much code in the call site as linear
> probing...

Is linear probing that bad? It's an extra increment and compare in the
miss case.

> But when I played with the generation side, a failure to create a table at a
> given size would *always* be due to a single collision. This is what I did
> in the twoshift+fback benchmark.

But it won't always be. One can always increase the size of the main
table however, if two collisions are rare enough.

>> Duplicate tables works as long as there aren't too many orthogonal
>> considerations. Is the GIL the only one? What about "I can propagate
>> errors?" Now we're up to 4 tables...
>
> Would your decision of whether or not to dispatch to a function depend on
> whether or not it propagates errors?
>
> I'm thinking of the "with gil" function case, i.e. callee has:
>
> ?a) Function to call if you have the GIL
> ?b) GIL-acquiring wrapper
>
> and you want GIL-holding code to call a) and nogil code to call b).
>
> But one could just make the caller acquire the GIL if needed (which in that
> case is so expensive anyway that it can be made the unlikely() path).

Are you saying you'd add code to the call site to determine if it
needs (and conditionally acquire) the GIL?

> I can't think of other situations where you would pick which function to
> call based on flags.

If the caller doesn't propagate errors, it may want to have different
codepaths depending on whether the callee propagates them.

- Robert

From d.s.seljebotn at astro.uio.no  Tue Jun  5 23:41:15 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 05 Jun 2012 23:41:15 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QA+3xmoK-K6X44jyLFyUadgSZV45d4b68bmJ=UeLjmgmg@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<4FCE679C.7000002@astro.uio.no>
	<CADiQ+QA+3xmoK-K6X44jyLFyUadgSZV45d4b68bmJ=UeLjmgmg@mail.gmail.com>
Message-ID: <4FCE7CFB.7000205@astro.uio.no>

On 06/05/2012 10:50 PM, Robert Bradshaw wrote:
> On Tue, Jun 5, 2012 at 1:10 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/04/2012 11:43 PM, Robert Bradshaw wrote:
>>>
>>> On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>
>>>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>>>
>>>>>
>>>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>>>> already, but I figured it was better to continue it and provide more
>>>>> in-depth benchmark results here.
>>>>>
>>>>> It's basically a new idea of how to provide a vtable based on perfect
>>>>> hashing, which should be a lot simpler to implement than what I first
>>>>> imagined.
>>>>>
>>>>> I'll write down some context first, if you're familiar with this
>>>>> skip ahead a bit..
>>>>>
>>>>> This means that you can do fast dispatches *without* the messy
>>>>> business of binding vtable slots at compile time. To be concrete, this
>>>>> might e.g. take the form
>>>>>
>>>>> def f(obj):
>>>>> obj.method(3.4) # try to find a vtable with "void method(double)" in it
>>>>>
>>>>> or, a more typed approach,
>>>>>
>>>>> # File A
>>>>> cdef class MyImpl:
>>>>> def double method(double x): return x * x
>>>>>
>>>>> # File B
>>>>> # Here we never know about MyImpl, hence "duck-typed"
>>>>> @cython.interface
>>>>> class MyIntf:
>>>>> def double method(double x): pass
>>>>>
>>>>> def f(MyIntf obj):
>>>>> # obj *can* be MyImpl instance, or whatever else that supports
>>>>> # that interface
>>>>> obj.method(3.4)
>>>>>
>>>>>
>>>>> Now, the idea to implement this is:
>>>>>
>>>>> a) Both caller and callee pre-hash name/argument string
>>>>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>>>>> md5)
>>>>>
>>>>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>>>>> hashing. What you do is define a final hash fh as a function
>>>>> of the pre-hash ph, for instance
>>>>>
>>>>> fh = ((ph>>    vtable.r1) ^ (ph>>    vtable.r2) ^ (ph>>    vtable.r3))&
>>>>> vtable.m
>>>>>
>>>>> (Me and Robert are benchmarking different functions to use here.) By
>>>>> playing with r1, r2, r3, you have 64**3 choices of hash function, and
>>>>> will be able to pick a combination which gives *no* (or very few)
>>>>> collisions.
>>>>>
>>>>> c) Caller then combines the pre-hash generated at compile-time, with
>>>>> r1, r2, r3, m stored in the vtable header, in order to find the
>>>>> final location in the hash-table.
>>>>>
>>>>> The exciting thing is that in benchmark, the performance penalty is
>>>>> actually very slight over a C++-style v-table. (Of course you can
>>>>> cache a proper vtable, but the fact that you get so close without
>>>>> caring about caching means that this can be done much faster.)
>>>
>>>
>>> One advantage about caching a vtable is that one can possibly put in
>>> adapters for non-exact matches. It also opens up the possibility of
>>> putting in stubs to call def methods if they exist. This needs to be
>>> fleshed out more, (another CEP :) but could provide for a
>>> backwards-compatible easy first implementation.
>>>
>>>>> Back to my and Robert's discussion on benchmarks:
>>>>>
>>>>> I've uploaded benchmarks here:
>>>>>
>>>>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>>>>
>>>>> I've changed the benchmark taking to give more robust numbers (at
>>>>> least for me), you want to look at the 'min' column.
>>>>>
>>>>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>>>>> So we don't pass 'h' on the stack, but either a) looks it up in a global
>>>>> variable (default), or b) it's a compile-time constant (immediate in
>>>>> assembly) (compile with -DIMHASH).
>>>>>
>>>>> Similarly, the ID is either an "interned" global variable, or an
>>>>> immediate (-DIMID).
>>>>>
>>>>> The results are very different on my machine depending on this aspect.
>>>>> My conclusions:
>>>>>
>>>>> - Both three shifts with masking, two shifts with a "fallback slot"
>>>>> (allowing for a single collision), three shifts, two shifts with
>>>>> two masks allows for constructing good vtables. In the case of only
>>>>> two shifts, one colliding method gets the twoshift+fback
>>>>> performance and the rest gets the twoshift performance.
>>>>>
>>>>> - Performance is really more affected by whether hashes are
>>>>> immediates or global variables than the hash function. This is in
>>>>> contrast to the interning vs. key benchmarks -- so I think that if
>>>>> we looked up the vtable through PyTypeObject, rather than getting
>>>>> the vtable directly, the loads of the global variables could
>>>>> potentially be masked by that.
>>>>>
>>>>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>>>>> and the ID-ing (don't bother with any interning), and compile the
>>>>> thing as a 64-bit immediate. This can cause crashes/stack smashes
>>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>>> probability is incredibly small, b) it would only matter in
>>>>> situations that should cause an AttributeError anyway, c) if we
>>>>> really care, we can always use an interning-like mechanism to
>>>>> validate on module loading that its hashes doesn't collide with
>>>>> other hashes (and raise an exception "Congratulations, you've
>>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>>> devs and we'll work around it right away").
>>>
>>>
>>> Due to the birthday paradox, this seems a bit risky. Maybe it's
>>> because I regularly work with collections much bigger than 2^32, and I
>>> suppose we're talking about unique method names and signatures here,
>>> but still... I wonder what the penalty would be for checking the full
>>> 128 bit hash. (Storing it could allow for greater entropy in the
>>> optimal hash table search as well).
>>
>>
>> Wonder no more. Here's the penalty for different bit-lengths, all
>> compile-time constants:
>>
>>      threeshift: min=6.08e-09  mean=6.11e-09  std=2.81e-11
>> val=1200000000.000000
>>    threeshift96: min=7.53e-09  mean=7.55e-09  std=1.96e-11
>> val=1200000000.000000
>>   threeshift128: min=6.95e-09  mean=6.97e-09  std=2.57e-11
>> val=1200000000.000000
>>   threeshift160: min=8.17e-09  mean=8.23e-09  std=4.06e-11
>> val=1200000000.000000
>>
>> And for comparison, when loading the comparison IDs from global variable:
>>
>>      threeshift: min=6.46e-09  mean=6.52e-09  std=4.95e-11
>> val=1200000000.000000
>>    threeshift96: min=8.07e-09  mean=8.16e-09  std=4.55e-11
>> val=1200000000.000000
>>   threeshift128: min=8.06e-09  mean=8.18e-09  std=6.71e-11
>> val=1200000000.000000
>>   threeshift160: min=9.71e-09  mean=9.83e-09  std=5.12e-11
>> val=1200000000.000000
>>
>> So indeed,
>>
>> 64-bit hash<  interning<  128 bit hash
>>
>> (At least on my Intel Nehalem Core i7 1.87GhZ)
>>
>> And the load of the global variable may in real life be hidden by other
>> things going on in the function.
>>
>> And, you save vtable memory by having an interned char* and not saving the
>> hash in the vtable.
>
> I'm OK with using the 64-bit hash with a macro to enable further
> checking. If it becomes an issue, we can partition the vtable into two
> separate structures (hash64/pointer/flags? + hash160/char*/metadata).
> That's probably overkill. With an eye to security, perhaps the spec
> should be sha1 (or sha2?, not sure if that ships with Python).

No, I like splitting up the table, I was assuming we'd stick the char* 
in a different table anyway. Cache is precious, and the second table 
would be completely cold in most situations.

Is the goal then to avoid having to have an interning registry?

Something that hasn't come up so far is that Cython doesn't know the 
exact types of external typedefs, so it can't generate the hash at 
Cythonize-time. I guess some support for build systems to probe for type 
sizes and compute the signature hashes in a sepearate header file would 
solve this -- with a fallback to computing them runtime at module 
loading, if you're not using a supported build system. (But suddenly an 
interning registry doesn't look so horrible..)

Really, I think a micro-benchmark is rather pessimistic about the 
performance of loading a global variable -- if more stuff happens around 
the call site then the load will likely be moved ahead and the latency 
hidden. Perhaps this might even be the case just for going the route 
through extensibletypeobject.

>> They should be made more easily runnable so that we could run them on
>> various systems, but it makes sense to first read up on and figure out which
>> hash functions are really viable, to keep the number of numbers down.
>>
>> I just realized that I never pushed the changes I did to introduce
>> -DIMHASH/-DIMID etc., but the benchmarks are pushed now.
>>
>>
>>
>>> We could also do a fallback table. Usually it'd be empty, Occasionally
>>> it'd have one element in it. It'd always be possible to make this big
>>> enough to avoid collisions in a worst-case scenario.
>>
>>
>> If you do a fallback table it's as much code in the call site as linear
>> probing...
>
> Is linear probing that bad? It's an extra increment and compare in the
> miss case.
>
>> But when I played with the generation side, a failure to create a table at a
>> given size would *always* be due to a single collision. This is what I did
>> in the twoshift+fback benchmark.
>
> But it won't always be. One can always increase the size of the main
> table however, if two collisions are rare enough.

Yes of course, I didn't test 100% fill of a 64-entry table. I was more 
concerned with making the table 128 or 256 rather than having to go to 
512 :-)

>>> Duplicate tables works as long as there aren't too many orthogonal
>>> considerations. Is the GIL the only one? What about "I can propagate
>>> errors?" Now we're up to 4 tables...
>>
>> Would your decision of whether or not to dispatch to a function depend on
>> whether or not it propagates errors?
>>
>> I'm thinking of the "with gil" function case, i.e. callee has:
>>
>>   a) Function to call if you have the GIL
>>   b) GIL-acquiring wrapper
>>
>> and you want GIL-holding code to call a) and nogil code to call b).
>>
>> But one could just make the caller acquire the GIL if needed (which in that
>> case is so expensive anyway that it can be made the unlikely() path).
>
> Are you saying you'd add code to the call site to determine if it
> needs (and conditionally acquire) the GIL?

Well, I'm saying it's an alternative, I'm not sure if it has merit. 
Basically shift the "with gil" responsibility to the caller in this case.

>
>> I can't think of other situations where you would pick which function to
>> call based on flags.
>
> If the caller doesn't propagate errors, it may want to have different
> codepaths depending on whether the callee propagates them.

Not sure if I understand. Would you call a *different* incarnation of 
the callee depending on this, and need different function pointers for 
different callers?

Otherwise you just check flags after the call and take the appropriate 
action, with a likely() around the likely one. You need flags, but not a 
different table.

Dag

From ian.h.bell at gmail.com  Wed Jun  6 10:04:07 2012
From: ian.h.bell at gmail.com (Ian Bell)
Date: Wed, 6 Jun 2012 01:04:07 -0700
Subject: [Cython] Resurrecting __dict__ for extension types
Message-ID: <CAJQnXJc+Ouj+Rd2iKYdmcquCJx1MhzFSZrYS_xO154t9sBFEmw@mail.gmail.com>

As per a couple of discussions online (
http://mail.python.org/pipermail/cython-devel/2011-February/000122.html),
it looks like at one point it was pretty close to being able to
programmatically and automatically generate a __dict__ for extension types
like for CPython classes.  I have to manually code a function that does
exactly what __dict__ should do, and it is a pain.  I have some classes
with tens of attributes, and that is already a big enough pain.  This is
especially useful to more easily enable deepcopy and pickling for classes.

While on the pickling theme, it seems it really ought to be pretty
straightforward to automatically pickle extension types.  Don't you already
have all the necessary information at compile time?  This was on the wish
list at one point if I am not mistaken and would be very useful to me and
lots of other people.

I'm finally loving coding in Cython and am finally making sense of how best
to use extension types.

Regards,
Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20120606/596190e2/attachment.html>

From stefan_ml at behnel.de  Wed Jun  6 10:58:37 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 06 Jun 2012 10:58:37 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CANg26EUZaRjEVem54TxbQ9iExgBKrVvG0R68J2hO4-NH3GaSFQ@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no> <4FCE3D51.20009@astro.uio.no>
	<CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>
	<4FCE5EFC.30407@astro.uio.no>
	<CANg26EUZaRjEVem54TxbQ9iExgBKrVvG0R68J2hO4-NH3GaSFQ@mail.gmail.com>
Message-ID: <4FCF1BBD.9070709@behnel.de>

mark florisson, 05.06.2012 22:33:
> It doesn't even necessarily have to be about running user code, a user
> could craft data input which causes such a situation. For instance,
> let's say we have a just-in-time specializer which specializes a
> function for the runtime input types, and the types depend on the user
> input. For instance, if we write a web application we can post arrays
> to described by a custom dtype, which draws pictures in some weird way
> for us. We can get it to specialize pretty much any array type, so
> that gives us a good opportunity to find collisions.

Yes, and the bad thing is that a very high probability of having no
collisions even in combination with the need for a huge amount of brute
force work to find one is not enough. An attacker (or otherwise interested
user) may just be lucky, and given how low in the application stack this
will be used, such a bit of luck may have massive consequences.

Stefan

From d.s.seljebotn at astro.uio.no  Wed Jun  6 11:11:15 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 06 Jun 2012 11:11:15 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCF1BBD.9070709@behnel.de>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no> <4FCE3D51.20009@astro.uio.no>
	<CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>
	<4FCE5EFC.30407@astro.uio.no>
	<CANg26EUZaRjEVem54TxbQ9iExgBKrVvG0R68J2hO4-NH3GaSFQ@mail.gmail.com>
	<4FCF1BBD.9070709@behnel.de>
Message-ID: <80b0aaa9-a1eb-4fba-b8fb-973766b20ed2@email.android.com>


Stefan Behnel <stefan_ml at behnel.de> wrote:

>mark florisson, 05.06.2012 22:33:
>> It doesn't even necessarily have to be about running user code, a
>user
>> could craft data input which causes such a situation. For instance,
>> let's say we have a just-in-time specializer which specializes a
>> function for the runtime input types, and the types depend on the
>user
>> input. For instance, if we write a web application we can post arrays
>> to described by a custom dtype, which draws pictures in some weird
>way
>> for us. We can get it to specialize pretty much any array type, so
>> that gives us a good opportunity to find collisions.
>
>Yes, and the bad thing is that a very high probability of having no
>collisions even in combination with the need for a huge amount of brute
>force work to find one is not enough. An attacker (or otherwise
>interested
>user) may just be lucky, and given how low in the application stack
>this
>will be used, such a bit of luck may have massive consequences.

Following that line of argument, I guess you keep your money in a mattress then? Our modern world is built around the assumption that people don't get *that* lucky.

(I agree though that 64 bits is not enough for the security usecase! I'm just saying that 160 or 256 bits would be.)

Dag


>
>Stefan
>_______________________________________________
>cython-devel mailing list
>cython-devel at python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

From markflorisson88 at gmail.com  Wed Jun  6 11:16:00 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 6 Jun 2012 10:16:00 +0100
Subject: [Cython] Hash-based vtables
In-Reply-To: <80b0aaa9-a1eb-4fba-b8fb-973766b20ed2@email.android.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no> <4FCE3D51.20009@astro.uio.no>
	<CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>
	<4FCE5EFC.30407@astro.uio.no>
	<CANg26EUZaRjEVem54TxbQ9iExgBKrVvG0R68J2hO4-NH3GaSFQ@mail.gmail.com>
	<4FCF1BBD.9070709@behnel.de>
	<80b0aaa9-a1eb-4fba-b8fb-973766b20ed2@email.android.com>
Message-ID: <CANg26EX_zN8kov8hnwDUwV8AJ+zhmgXUxM2UkLgHT2TaRgjK0A@mail.gmail.com>

On 6 June 2012 10:11, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
>
>
> Stefan Behnel <stefan_ml at behnel.de> wrote:
>
>>mark florisson, 05.06.2012 22:33:
>>> It doesn't even necessarily have to be about running user code, a
>>user
>>> could craft data input which causes such a situation. For instance,
>>> let's say we have a just-in-time specializer which specializes a
>>> function for the runtime input types, and the types depend on the
>>user
>>> input. For instance, if we write a web application we can post arrays
>>> to described by a custom dtype, which draws pictures in some weird
>>way
>>> for us. We can get it to specialize pretty much any array type, so
>>> that gives us a good opportunity to find collisions.
>>
>>Yes, and the bad thing is that a very high probability of having no
>>collisions even in combination with the need for a huge amount of brute
>>force work to find one is not enough. An attacker (or otherwise
>>interested
>>user) may just be lucky, and given how low in the application stack
>>this
>>will be used, such a bit of luck may have massive consequences.
>
> Following that line of argument, I guess you keep your money in a mattress then? Our modern world is built around the assumption that people don't get *that* lucky.
>
> (I agree though that 64 bits is not enough for the security usecase! I'm just saying that 160 or 256 bits would be.)
>
> Dag
>

I think we're arguing different things. You agree to the security
problem, but Stefan was still emphasizing his old point.

>>
>>Stefan
>>_______________________________________________
>>cython-devel mailing list
>>cython-devel at python.org
>>http://mail.python.org/mailman/listinfo/cython-devel
>
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From d.s.seljebotn at astro.uio.no  Wed Jun  6 11:16:38 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 06 Jun 2012 11:16:38 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <80b0aaa9-a1eb-4fba-b8fb-973766b20ed2@email.android.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCDB478.3070000@behnel.de>
	<4FCE3B5F.9080603@astro.uio.no> <4FCE3D51.20009@astro.uio.no>
	<CANg26EWegiQmA3C9_Ac5cFLWz6Gkp+X4wkf+8tHcrsanbeq4Wg@mail.gmail.com>
	<4FCE5EFC.30407@astro.uio.no>
	<CANg26EUZaRjEVem54TxbQ9iExgBKrVvG0R68J2hO4-NH3GaSFQ@mail.gmail.com>
	<4FCF1BBD.9070709@behnel.de>
	<80b0aaa9-a1eb-4fba-b8fb-973766b20ed2@email.android.com>
Message-ID: <4FCF1FF6.8070807@astro.uio.no>

On 06/06/2012 11:11 AM, Dag Sverre Seljebotn wrote:
>
>
> Stefan Behnel<stefan_ml at behnel.de>  wrote:
>
>> mark florisson, 05.06.2012 22:33:
>>> It doesn't even necessarily have to be about running user code, a
>> user
>>> could craft data input which causes such a situation. For instance,
>>> let's say we have a just-in-time specializer which specializes a
>>> function for the runtime input types, and the types depend on the
>> user
>>> input. For instance, if we write a web application we can post arrays
>>> to described by a custom dtype, which draws pictures in some weird
>> way
>>> for us. We can get it to specialize pretty much any array type, so
>>> that gives us a good opportunity to find collisions.
>>
>> Yes, and the bad thing is that a very high probability of having no
>> collisions even in combination with the need for a huge amount of brute
>> force work to find one is not enough. An attacker (or otherwise
>> interested
>> user) may just be lucky, and given how low in the application stack
>> this
>> will be used, such a bit of luck may have massive consequences.
>
> Following that line of argument, I guess you keep your money in a mattress then? Our modern world is built around the assumption that people don't get *that* lucky.
>
> (I agree though that 64 bits is not enough for the security usecase! I'm just saying that 160 or 256 bits would be.)

(And just to be clear, my current stance is in favour of using interning 
for the ID comparison, in the other head of this thread. I just couldn't 
resist Stefan's bait.)

Dag

From d.s.seljebotn at astro.uio.no  Wed Jun  6 22:41:44 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 06 Jun 2012 22:41:44 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
Message-ID: <4FCFC088.3000709@astro.uio.no>

On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
> I just found http://cmph.sourceforge.net/ which looks quite
> interesting. Though the resulting hash functions are supposedly cheap,
> I have the feeling that branching is considered cheap in this context.

Actually, this lead was *very* promising. I believe the very first 
reference I actually read through and didn't eliminate after the 
abstract totally swept away our home-grown solutions!

"Hash & Displace" by Pagh (1999) is actually very simple, easy to 
understand, and fast both for generation and (the branch-free) lookup:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf

The idea is:

  - Find a hash `g(x)` to partition the keys into `b` groups (the paper 
requires b > 2n, though I think in practice you can often get away with 
less)

  - Find a hash `f(x)` such that f is 1:1 within each group (which is 
easily achieved since groups only has a few elements)

  - For each group, from largest to smallest: Find a displacement 
`d[group]` so that `f(x) ^ d` doesn't cause collisions.

It requires extra storage for the displacement table. However, I think 8 
bits per element might suffice even for vtables of 512 or 1024 in size. 
Even with 16 bits it's rather negligible compared to the minimum-128-bit 
entries of the table.

I benchmarked these hash functions:

displace1: ((h >> r1) ^ d[h & 63]) & m1
displace2: ((h >> r1) ^ d[h & m2]) & m1
displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1

Only the third one is truly in the spirit of the algorithm, but I think 
the first two should work well too (and when h is known compile-time, 
looking up d[h & 63] isn't harder than looking up r1 or m1).

My computer is acting up and all my numbers today are slower than the 
earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago, 
and yes, I've pinned the CPU speed). But here's today's numbers, 
compiled with -DIMHASH:

          direct: min=5.37e-09  mean=5.39e-09  std=1.96e-11 
val=2400000000.000000
           index: min=6.45e-09  mean=6.46e-09  std=1.15e-11 
val=1800000000.000000
        twoshift: min=6.99e-09  mean=7.00e-09  std=1.35e-11 
val=1800000000.000000
      threeshift: min=7.53e-09  mean=7.54e-09  std=1.63e-11 
val=1800000000.000000
       displace1: min=6.99e-09  mean=7.00e-09  std=1.66e-11 
val=1800000000.000000
       displace2: min=6.99e-09  mean=7.02e-09  std=2.77e-11 
val=1800000000.000000
       displace3: min=7.52e-09  mean=7.54e-09  std=1.19e-11 
val=1800000000.000000


I did a dirty prototype of the table-finder as well and it works:

https://github.com/dagss/hashvtable/blob/master/pagh99.py

Dag

From d.s.seljebotn at astro.uio.no  Wed Jun  6 22:57:37 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 06 Jun 2012 22:57:37 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCFC088.3000709@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no>
Message-ID: <4FCFC441.40703@astro.uio.no>

On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>> I just found http://cmph.sourceforge.net/ which looks quite
>> interesting. Though the resulting hash functions are supposedly cheap,
>> I have the feeling that branching is considered cheap in this context.
>
> Actually, this lead was *very* promising. I believe the very first
> reference I actually read through and didn't eliminate after the
> abstract totally swept away our home-grown solutions!
>
> "Hash & Displace" by Pagh (1999) is actually very simple, easy to
> understand, and fast both for generation and (the branch-free) lookup:
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>
>
> The idea is:
>
> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
> requires b > 2n, though I think in practice you can often get away with
> less)
>
> - Find a hash `f(x)` such that f is 1:1 within each group (which is
> easily achieved since groups only has a few elements)
>
> - For each group, from largest to smallest: Find a displacement
> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>
> It requires extra storage for the displacement table. However, I think 8
> bits per element might suffice even for vtables of 512 or 1024 in size.
> Even with 16 bits it's rather negligible compared to the minimum-128-bit
> entries of the table.
>
> I benchmarked these hash functions:
>
> displace1: ((h >> r1) ^ d[h & 63]) & m1
> displace2: ((h >> r1) ^ d[h & m2]) & m1
> displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1
>
> Only the third one is truly in the spirit of the algorithm, but I think
> the first two should work well too (and when h is known compile-time,
> looking up d[h & 63] isn't harder than looking up r1 or m1).
>
> My computer is acting up and all my numbers today are slower than the
> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
> and yes, I've pinned the CPU speed). But here's today's numbers,
> compiled with -DIMHASH:
>
> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=1800000000.000000
> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=1800000000.000000
> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=1800000000.000000
> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=1800000000.000000
> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=1800000000.000000
>
>
> I did a dirty prototype of the table-finder as well and it works:
>
> https://github.com/dagss/hashvtable/blob/master/pagh99.py

The paper obviously puts more effort on minimizing table size and not a 
fast lookup. My hunch is that our choice should be

((h >> table.r) ^ table.d[h & m2]) & m1

and use 8-bits d (because even if you have 1024 methods, you'd rather 
double the number of bins than those 2 extra bits available for 
displacement options).

Then keep incrementing the size of d and the number of table slots (in 
such an order that the total vtable size is minimized) until success. In 
practice this should almost always just increase the size of d, and keep 
the table size at the lowest 2**k that fits the slots (even for 64 
methods or 128 methods :-))

Essentially we avoid the shift in the argument to d[] by making d larger.

Dag

From robertwb at gmail.com  Wed Jun  6 23:00:58 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 6 Jun 2012 14:00:58 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCE7CFB.7000205@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<4FCE679C.7000002@astro.uio.no>
	<CADiQ+QA+3xmoK-K6X44jyLFyUadgSZV45d4b68bmJ=UeLjmgmg@mail.gmail.com>
	<4FCE7CFB.7000205@astro.uio.no>
Message-ID: <CADiQ+QCs_yU7ebz_7HNPNEw=p7uTSu7Eh+L5LELDcOirfp8X=A@mail.gmail.com>

On Tue, Jun 5, 2012 at 2:41 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/05/2012 10:50 PM, Robert Bradshaw wrote:
>>
>> On Tue, Jun 5, 2012 at 1:10 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> On 06/04/2012 11:43 PM, Robert Bradshaw wrote:
>>>>
>>>>
>>>> On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> ? ?wrote:
>>>>>
>>>>>
>>>>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>>>>> already, but I figured it was better to continue it and provide more
>>>>>> in-depth benchmark results here.
>>>>>>
>>>>>> It's basically a new idea of how to provide a vtable based on perfect
>>>>>> hashing, which should be a lot simpler to implement than what I first
>>>>>> imagined.
>>>>>>
>>>>>> I'll write down some context first, if you're familiar with this
>>>>>> skip ahead a bit..
>>>>>>
>>>>>> This means that you can do fast dispatches *without* the messy
>>>>>> business of binding vtable slots at compile time. To be concrete, this
>>>>>> might e.g. take the form
>>>>>>
>>>>>> def f(obj):
>>>>>> obj.method(3.4) # try to find a vtable with "void method(double)" in
>>>>>> it
>>>>>>
>>>>>> or, a more typed approach,
>>>>>>
>>>>>> # File A
>>>>>> cdef class MyImpl:
>>>>>> def double method(double x): return x * x
>>>>>>
>>>>>> # File B
>>>>>> # Here we never know about MyImpl, hence "duck-typed"
>>>>>> @cython.interface
>>>>>> class MyIntf:
>>>>>> def double method(double x): pass
>>>>>>
>>>>>> def f(MyIntf obj):
>>>>>> # obj *can* be MyImpl instance, or whatever else that supports
>>>>>> # that interface
>>>>>> obj.method(3.4)
>>>>>>
>>>>>>
>>>>>> Now, the idea to implement this is:
>>>>>>
>>>>>> a) Both caller and callee pre-hash name/argument string
>>>>>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>>>>>> md5)
>>>>>>
>>>>>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>>>>>> hashing. What you do is define a final hash fh as a function
>>>>>> of the pre-hash ph, for instance
>>>>>>
>>>>>> fh = ((ph>> ? ?vtable.r1) ^ (ph>> ? ?vtable.r2) ^ (ph>>
>>>>>> ?vtable.r3))&
>>>>>> vtable.m
>>>>>>
>>>>>> (Me and Robert are benchmarking different functions to use here.) By
>>>>>> playing with r1, r2, r3, you have 64**3 choices of hash function, and
>>>>>> will be able to pick a combination which gives *no* (or very few)
>>>>>> collisions.
>>>>>>
>>>>>> c) Caller then combines the pre-hash generated at compile-time, with
>>>>>> r1, r2, r3, m stored in the vtable header, in order to find the
>>>>>> final location in the hash-table.
>>>>>>
>>>>>> The exciting thing is that in benchmark, the performance penalty is
>>>>>> actually very slight over a C++-style v-table. (Of course you can
>>>>>> cache a proper vtable, but the fact that you get so close without
>>>>>> caring about caching means that this can be done much faster.)
>>>>
>>>>
>>>>
>>>> One advantage about caching a vtable is that one can possibly put in
>>>> adapters for non-exact matches. It also opens up the possibility of
>>>> putting in stubs to call def methods if they exist. This needs to be
>>>> fleshed out more, (another CEP :) but could provide for a
>>>> backwards-compatible easy first implementation.
>>>>
>>>>>> Back to my and Robert's discussion on benchmarks:
>>>>>>
>>>>>> I've uploaded benchmarks here:
>>>>>>
>>>>>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>>>>>
>>>>>> I've changed the benchmark taking to give more robust numbers (at
>>>>>> least for me), you want to look at the 'min' column.
>>>>>>
>>>>>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>>>>>> So we don't pass 'h' on the stack, but either a) looks it up in a
>>>>>> global
>>>>>> variable (default), or b) it's a compile-time constant (immediate in
>>>>>> assembly) (compile with -DIMHASH).
>>>>>>
>>>>>> Similarly, the ID is either an "interned" global variable, or an
>>>>>> immediate (-DIMID).
>>>>>>
>>>>>> The results are very different on my machine depending on this aspect.
>>>>>> My conclusions:
>>>>>>
>>>>>> - Both three shifts with masking, two shifts with a "fallback slot"
>>>>>> (allowing for a single collision), three shifts, two shifts with
>>>>>> two masks allows for constructing good vtables. In the case of only
>>>>>> two shifts, one colliding method gets the twoshift+fback
>>>>>> performance and the rest gets the twoshift performance.
>>>>>>
>>>>>> - Performance is really more affected by whether hashes are
>>>>>> immediates or global variables than the hash function. This is in
>>>>>> contrast to the interning vs. key benchmarks -- so I think that if
>>>>>> we looked up the vtable through PyTypeObject, rather than getting
>>>>>> the vtable directly, the loads of the global variables could
>>>>>> potentially be masked by that.
>>>>>>
>>>>>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>>>>>> and the ID-ing (don't bother with any interning), and compile the
>>>>>> thing as a 64-bit immediate. This can cause crashes/stack smashes
>>>>>> etc. if there's lower-64bit-of-md5 collisions, but a) the
>>>>>> probability is incredibly small, b) it would only matter in
>>>>>> situations that should cause an AttributeError anyway, c) if we
>>>>>> really care, we can always use an interning-like mechanism to
>>>>>> validate on module loading that its hashes doesn't collide with
>>>>>> other hashes (and raise an exception "Congratulations, you've
>>>>>> discovered a phenomenal md5 collision, get in touch with cython
>>>>>> devs and we'll work around it right away").
>>>>
>>>>
>>>>
>>>> Due to the birthday paradox, this seems a bit risky. Maybe it's
>>>> because I regularly work with collections much bigger than 2^32, and I
>>>> suppose we're talking about unique method names and signatures here,
>>>> but still... I wonder what the penalty would be for checking the full
>>>> 128 bit hash. (Storing it could allow for greater entropy in the
>>>> optimal hash table search as well).
>>>
>>>
>>>
>>> Wonder no more. Here's the penalty for different bit-lengths, all
>>> compile-time constants:
>>>
>>> ? ? threeshift: min=6.08e-09 ?mean=6.11e-09 ?std=2.81e-11
>>> val=1200000000.000000
>>> ? threeshift96: min=7.53e-09 ?mean=7.55e-09 ?std=1.96e-11
>>> val=1200000000.000000
>>> ?threeshift128: min=6.95e-09 ?mean=6.97e-09 ?std=2.57e-11
>>> val=1200000000.000000
>>> ?threeshift160: min=8.17e-09 ?mean=8.23e-09 ?std=4.06e-11
>>> val=1200000000.000000
>>>
>>> And for comparison, when loading the comparison IDs from global variable:
>>>
>>> ? ? threeshift: min=6.46e-09 ?mean=6.52e-09 ?std=4.95e-11
>>> val=1200000000.000000
>>> ? threeshift96: min=8.07e-09 ?mean=8.16e-09 ?std=4.55e-11
>>> val=1200000000.000000
>>> ?threeshift128: min=8.06e-09 ?mean=8.18e-09 ?std=6.71e-11
>>> val=1200000000.000000
>>> ?threeshift160: min=9.71e-09 ?mean=9.83e-09 ?std=5.12e-11
>>> val=1200000000.000000
>>>
>>> So indeed,
>>>
>>> 64-bit hash< ?interning< ?128 bit hash
>>>
>>> (At least on my Intel Nehalem Core i7 1.87GhZ)
>>>
>>> And the load of the global variable may in real life be hidden by other
>>> things going on in the function.
>>>
>>> And, you save vtable memory by having an interned char* and not saving
>>> the
>>> hash in the vtable.
>>
>>
>> I'm OK with using the 64-bit hash with a macro to enable further
>> checking. If it becomes an issue, we can partition the vtable into two
>> separate structures (hash64/pointer/flags? + hash160/char*/metadata).
>> That's probably overkill. With an eye to security, perhaps the spec
>> should be sha1 (or sha2?, not sure if that ships with Python).
>
>
> No, I like splitting up the table, I was assuming we'd stick the char* in a
> different table anyway. Cache is precious, and the second table would be
> completely cold in most situations.
>
> Is the goal then to avoid having to have an interning registry?

Yes, and to avoid invoking an expensive hash function at runtime in
order to achieve good distribution.

> Something that hasn't come up so far is that Cython doesn't know the exact
> types of external typedefs, so it can't generate the hash at Cythonize-time.
> I guess some support for build systems to probe for type sizes and compute
> the signature hashes in a sepearate header file would solve this -- with a
> fallback to computing them runtime at module loading, if you're not using a
> supported build system. (But suddenly an interning registry doesn't look so
> horrible..)

It all depends on how strict you want to be. It may be acceptable to
let f(int) and f(long) not hash to the same value even if sizeof(int)
== sizeof(long). We could also promote all int types to long or long
long, including extern times (assuming, with a c-compile-time check,
external types declared up to "long" are <= sizeof(long)). Another
option is to let the hash be md5(sig) + hashN(sizeof(extern_arg1),
sizeof(extern_argN)) where hashN is a macro.

> Really, I think a micro-benchmark is rather pessimistic about the
> performance of loading a global variable -- if more stuff happens around the
> call site then the load will likely be moved ahead and the latency hidden.
> Perhaps this might even be the case just for going the route through
> extensibletypeobject.
>
>
>>> They should be made more easily runnable so that we could run them on
>>> various systems, but it makes sense to first read up on and figure out
>>> which
>>> hash functions are really viable, to keep the number of numbers down.
>>>
>>> I just realized that I never pushed the changes I did to introduce
>>> -DIMHASH/-DIMID etc., but the benchmarks are pushed now.
>>>
>>>
>>>
>>>> We could also do a fallback table. Usually it'd be empty, Occasionally
>>>> it'd have one element in it. It'd always be possible to make this big
>>>> enough to avoid collisions in a worst-case scenario.
>>>
>>>
>>>
>>> If you do a fallback table it's as much code in the call site as linear
>>> probing...
>>
>>
>> Is linear probing that bad? It's an extra increment and compare in the
>> miss case.
>>
>>> But when I played with the generation side, a failure to create a table
>>> at a
>>> given size would *always* be due to a single collision. This is what I
>>> did
>>> in the twoshift+fback benchmark.
>>
>>
>> But it won't always be. One can always increase the size of the main
>> table however, if two collisions are rare enough.
>
>
> Yes of course, I didn't test 100% fill of a 64-entry table. I was more
> concerned with making the table 128 or 256 rather than having to go to 512
> :-)
>
>
>>>> Duplicate tables works as long as there aren't too many orthogonal
>>>> considerations. Is the GIL the only one? What about "I can propagate
>>>> errors?" Now we're up to 4 tables...
>>>
>>>
>>> Would your decision of whether or not to dispatch to a function depend on
>>> whether or not it propagates errors?
>>>
>>> I'm thinking of the "with gil" function case, i.e. callee has:
>>>
>>> ?a) Function to call if you have the GIL
>>> ?b) GIL-acquiring wrapper
>>>
>>> and you want GIL-holding code to call a) and nogil code to call b).
>>>
>>> But one could just make the caller acquire the GIL if needed (which in
>>> that
>>> case is so expensive anyway that it can be made the unlikely() path).
>>
>>
>> Are you saying you'd add code to the call site to determine if it
>> needs (and conditionally acquire) the GIL?
>
>
> Well, I'm saying it's an alternative, I'm not sure if it has merit.
> Basically shift the "with gil" responsibility to the caller in this case.
>
>
>>
>>> I can't think of other situations where you would pick which function to
>>> call based on flags.
>>
>>
>> If the caller doesn't propagate errors, it may want to have different
>> codepaths depending on whether the callee propagates them.
>
>
> Not sure if I understand. Would you call a *different* incarnation of the
> callee depending on this, and need different function pointers for different
> callers?
>
> Otherwise you just check flags after the call and take the appropriate
> action, with a likely() around the likely one. You need flags, but not a
> different table.

Fair enough.

- Robert

From robertwb at gmail.com  Wed Jun  6 23:16:56 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 6 Jun 2012 14:16:56 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCFC441.40703@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
Message-ID: <CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>

On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>
>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>
>>> I just found http://cmph.sourceforge.net/ which looks quite
>>> interesting. Though the resulting hash functions are supposedly cheap,
>>> I have the feeling that branching is considered cheap in this context.
>>
>>
>> Actually, this lead was *very* promising. I believe the very first
>> reference I actually read through and didn't eliminate after the
>> abstract totally swept away our home-grown solutions!
>>
>> "Hash & Displace" by Pagh (1999) is actually very simple, easy to
>> understand, and fast both for generation and (the branch-free) lookup:
>>
>>
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>
>>
>> The idea is:
>>
>> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
>> requires b > 2n, though I think in practice you can often get away with
>> less)
>>
>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>> easily achieved since groups only has a few elements)
>>
>> - For each group, from largest to smallest: Find a displacement
>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>
>> It requires extra storage for the displacement table. However, I think 8
>> bits per element might suffice even for vtables of 512 or 1024 in size.
>> Even with 16 bits it's rather negligible compared to the minimum-128-bit
>> entries of the table.
>>
>> I benchmarked these hash functions:
>>
>> displace1: ((h >> r1) ^ d[h & 63]) & m1
>> displace2: ((h >> r1) ^ d[h & m2]) & m1
>> displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1
>>
>> Only the third one is truly in the spirit of the algorithm, but I think
>> the first two should work well too (and when h is known compile-time,
>> looking up d[h & 63] isn't harder than looking up r1 or m1).
>>
>> My computer is acting up and all my numbers today are slower than the
>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
>> and yes, I've pinned the CPU speed). But here's today's numbers,
>> compiled with -DIMHASH:
>>
>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=1800000000.000000
>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=1800000000.000000
>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=1800000000.000000
>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=1800000000.000000
>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=1800000000.000000
>>
>>
>> I did a dirty prototype of the table-finder as well and it works:
>>
>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>
>
> The paper obviously puts more effort on minimizing table size and not a fast
> lookup. My hunch is that our choice should be
>
> ((h >> table.r) ^ table.d[h & m2]) & m1
>
> and use 8-bits d (because even if you have 1024 methods, you'd rather double
> the number of bins than those 2 extra bits available for displacement
> options).
>
> Then keep incrementing the size of d and the number of table slots (in such
> an order that the total vtable size is minimized) until success. In practice
> this should almost always just increase the size of d, and keep the table
> size at the lowest 2**k that fits the slots (even for 64 methods or 128
> methods :-))
>
> Essentially we avoid the shift in the argument to d[] by making d larger.

Nice. I'm surprised that the indirection on d doesn't cost us much;
hopefully its size wouldn't be a big issue either. What kinds of
densities were you achieving?

Going back to the idea of linear probing on a cache miss, this has the
advantage that one can write a brain-dead provider that sets m=0 and
simply lists the methods instead of requiring a table optimizer. (Most
tools, of course, would do the table optimization.) It also lets you
get away with a "kind-of good" hash rather than requiring you search
until you find a (larger?) perfect one.

- Robert

From d.s.seljebotn at astro.uio.no  Wed Jun  6 23:36:09 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 06 Jun 2012 23:36:09 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
Message-ID: <4FCFCD49.9030802@astro.uio.no>

On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>
>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>> interesting. Though the resulting hash functions are supposedly cheap,
>>>> I have the feeling that branching is considered cheap in this context.
>>>
>>>
>>> Actually, this lead was *very* promising. I believe the very first
>>> reference I actually read through and didn't eliminate after the
>>> abstract totally swept away our home-grown solutions!
>>>
>>> "Hash&  Displace" by Pagh (1999) is actually very simple, easy to
>>> understand, and fast both for generation and (the branch-free) lookup:
>>>
>>>
>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>
>>>
>>> The idea is:
>>>
>>> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
>>> requires b>  2n, though I think in practice you can often get away with
>>> less)
>>>
>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>> easily achieved since groups only has a few elements)
>>>
>>> - For each group, from largest to smallest: Find a displacement
>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>
>>> It requires extra storage for the displacement table. However, I think 8
>>> bits per element might suffice even for vtables of 512 or 1024 in size.
>>> Even with 16 bits it's rather negligible compared to the minimum-128-bit
>>> entries of the table.
>>>
>>> I benchmarked these hash functions:
>>>
>>> displace1: ((h>>  r1) ^ d[h&  63])&  m1
>>> displace2: ((h>>  r1) ^ d[h&  m2])&  m1
>>> displace3: ((h>>  r1) ^ d[(h>>  r2)&  m2])&  m1
>>>
>>> Only the third one is truly in the spirit of the algorithm, but I think
>>> the first two should work well too (and when h is known compile-time,
>>> looking up d[h&  63] isn't harder than looking up r1 or m1).
>>>
>>> My computer is acting up and all my numbers today are slower than the
>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>> compiled with -DIMHASH:
>>>
>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=1800000000.000000
>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=1800000000.000000
>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=1800000000.000000
>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=1800000000.000000
>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=1800000000.000000
>>>
>>>
>>> I did a dirty prototype of the table-finder as well and it works:
>>>
>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>
>>
>> The paper obviously puts more effort on minimizing table size and not a fast
>> lookup. My hunch is that our choice should be
>>
>> ((h>>  table.r) ^ table.d[h&  m2])&  m1
>>
>> and use 8-bits d (because even if you have 1024 methods, you'd rather double
>> the number of bins than those 2 extra bits available for displacement
>> options).
>>
>> Then keep incrementing the size of d and the number of table slots (in such
>> an order that the total vtable size is minimized) until success. In practice
>> this should almost always just increase the size of d, and keep the table
>> size at the lowest 2**k that fits the slots (even for 64 methods or 128
>> methods :-))
>>
>> Essentially we avoid the shift in the argument to d[] by making d larger.
>
> Nice. I'm surprised that the indirection on d doesn't cost us much;

Well, table->d[const & const] compiles down to the same kind of code as 
table->m1. I guess I'm surprised too that displace2 doesn't penalize.

> hopefully its size wouldn't be a big issue either. What kinds of
> densities were you achieving?

The algorithm is designed for 100% density in the table itself. (We can 
lift that to compensate for a small space of possible hash functions I 
guess.)

I haven't done proper simulations yet, but I just tried |vtable|=128, 
|d|=128 from the command line and I had 15 successes or so before the 
first failure. That's with a 100% density in the vtable itself! (And 
when it fails, you increase |d| to get your success).

The caveat is the space spent on d (it's small in comparison, but that's 
why this isn't too good to be true).

A disadvantage might be that we may no longer have the opportunity to 
not make the table size a power of two (i.e. replace the mask with "if 
(likely(slot < n))"). I think for that to work one would need to replace 
the xor group with addition on Z_d.

> Going back to the idea of linear probing on a cache miss, this has the
> advantage that one can write a brain-dead provider that sets m=0 and
> simply lists the methods instead of requiring a table optimizer. (Most
> tools, of course, would do the table optimization.) It also lets you
> get away with a "kind-of good" hash rather than requiring you search
> until you find a (larger?) perfect one.

Well, given that we can have 100% density, and generating the table is 
lightning fast, and the C code to generate the table is likely a 300 
line utility... I'm not convinced.

We should however make sure that *callers* can do a linear scan and use 
strcmp if they don't care about performance.

Dag

From d.s.seljebotn at astro.uio.no  Thu Jun  7 00:03:57 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 07 Jun 2012 00:03:57 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCFCD49.9030802@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
Message-ID: <bd6a0044-b2fe-4c85-a4d5-7827f38e3c83@email.android.com>


Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:

>On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>  wrote:
>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>
>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>> interesting. Though the resulting hash functions are supposedly
>cheap,
>>>>> I have the feeling that branching is considered cheap in this
>context.
>>>>
>>>>
>>>> Actually, this lead was *very* promising. I believe the very first
>>>> reference I actually read through and didn't eliminate after the
>>>> abstract totally swept away our home-grown solutions!
>>>>
>>>> "Hash&  Displace" by Pagh (1999) is actually very simple, easy to
>>>> understand, and fast both for generation and (the branch-free)
>lookup:
>>>>
>>>>
>>>>
>http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>
>>>>
>>>> The idea is:
>>>>
>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the
>paper
>>>> requires b>  2n, though I think in practice you can often get away
>with
>>>> less)
>>>>
>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>> easily achieved since groups only has a few elements)
>>>>
>>>> - For each group, from largest to smallest: Find a displacement
>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>
>>>> It requires extra storage for the displacement table. However, I
>think 8
>>>> bits per element might suffice even for vtables of 512 or 1024 in
>size.
>>>> Even with 16 bits it's rather negligible compared to the
>minimum-128-bit
>>>> entries of the table.
>>>>
>>>> I benchmarked these hash functions:
>>>>
>>>> displace1: ((h>>  r1) ^ d[h&  63])&  m1
>>>> displace2: ((h>>  r1) ^ d[h&  m2])&  m1
>>>> displace3: ((h>>  r1) ^ d[(h>>  r2)&  m2])&  m1
>>>>
>>>> Only the third one is truly in the spirit of the algorithm, but I
>think
>>>> the first two should work well too (and when h is known
>compile-time,
>>>> looking up d[h&  63] isn't harder than looking up r1 or m1).
>>>>
>>>> My computer is acting up and all my numbers today are slower than
>the
>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
>ago,
>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>> compiled with -DIMHASH:
>>>>
>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11
>val=2400000000.000000
>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11
>val=1800000000.000000
>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
>val=1800000000.000000
>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>val=1800000000.000000
>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
>val=1800000000.000000
>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
>val=1800000000.000000
>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
>val=1800000000.000000
>>>>
>>>>
>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>
>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>
>>>
>>> The paper obviously puts more effort on minimizing table size and
>not a fast
>>> lookup. My hunch is that our choice should be
>>>
>>> ((h>>  table.r) ^ table.d[h&  m2])&  m1
>>>
>>> and use 8-bits d (because even if you have 1024 methods, you'd
>rather double
>>> the number of bins than those 2 extra bits available for
>displacement
>>> options).
>>>
>>> Then keep incrementing the size of d and the number of table slots
>(in such
>>> an order that the total vtable size is minimized) until success. In
>practice
>>> this should almost always just increase the size of d, and keep the
>table
>>> size at the lowest 2**k that fits the slots (even for 64 methods or
>128
>>> methods :-))
>>>
>>> Essentially we avoid the shift in the argument to d[] by making d
>larger.
>>
>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>
>Well, table->d[const & const] compiles down to the same kind of code as
>
>table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>
>> hopefully its size wouldn't be a big issue either. What kinds of
>> densities were you achieving?
>
>The algorithm is designed for 100% density in the table itself. (We can
>
>lift that to compensate for a small space of possible hash functions I 
>guess.)
>
>I haven't done proper simulations yet, but I just tried |vtable|=128, 
>|d|=128 from the command line and I had 15 successes or so before the 
>first failure. That's with a 100% density in the vtable itself! (And 
>when it fails, you increase |d| to get your success).
>
>The caveat is the space spent on d (it's small in comparison, but
>that's 
>why this isn't too good to be true).
>
>A disadvantage might be that we may no longer have the opportunity to 
>not make the table size a power of two (i.e. replace the mask with "if 
>(likely(slot < n))"). I think for that to work one would need to
>replace 
>the xor group with addition on Z_d.

Strike this paragraph; don't know what I was thinking...

Dag
>
>> Going back to the idea of linear probing on a cache miss, this has
>the
>> advantage that one can write a brain-dead provider that sets m=0 and
>> simply lists the methods instead of requiring a table optimizer.
>(Most
>> tools, of course, would do the table optimization.) It also lets you
>> get away with a "kind-of good" hash rather than requiring you search
>> until you find a (larger?) perfect one.
>
>Well, given that we can have 100% density, and generating the table is 
>lightning fast, and the C code to generate the table is likely a 300 
>line utility... I'm not convinced.
>
>We should however make sure that *callers* can do a linear scan and use
>
>strcmp if they don't care about performance.
>
>Dag
>_______________________________________________
>cython-devel mailing list
>cython-devel at python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

From robertwb at gmail.com  Thu Jun  7 00:26:42 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 6 Jun 2012 15:26:42 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FCFCD49.9030802@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
Message-ID: <CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>

On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>
>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>>
>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>
>>>>>
>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>> interesting. Though the resulting hash functions are supposedly cheap,
>>>>> I have the feeling that branching is considered cheap in this context.
>>>>
>>>>
>>>>
>>>> Actually, this lead was *very* promising. I believe the very first
>>>> reference I actually read through and didn't eliminate after the
>>>> abstract totally swept away our home-grown solutions!
>>>>
>>>> "Hash& ?Displace" by Pagh (1999) is actually very simple, easy to
>>>>
>>>> understand, and fast both for generation and (the branch-free) lookup:
>>>>
>>>>
>>>>
>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>
>>>>
>>>> The idea is:
>>>>
>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
>>>> requires b> ?2n, though I think in practice you can often get away with
>>>> less)
>>>>
>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>> easily achieved since groups only has a few elements)
>>>>
>>>> - For each group, from largest to smallest: Find a displacement
>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>
>>>> It requires extra storage for the displacement table. However, I think 8
>>>> bits per element might suffice even for vtables of 512 or 1024 in size.
>>>> Even with 16 bits it's rather negligible compared to the minimum-128-bit
>>>> entries of the table.
>>>>
>>>> I benchmarked these hash functions:
>>>>
>>>> displace1: ((h>> ?r1) ^ d[h& ?63])& ?m1
>>>> displace2: ((h>> ?r1) ^ d[h& ?m2])& ?m1
>>>> displace3: ((h>> ?r1) ^ d[(h>> ?r2)& ?m2])& ?m1
>>>>
>>>>
>>>> Only the third one is truly in the spirit of the algorithm, but I think
>>>> the first two should work well too (and when h is known compile-time,
>>>> looking up d[h& ?63] isn't harder than looking up r1 or m1).
>>>>
>>>>
>>>> My computer is acting up and all my numbers today are slower than the
>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>> compiled with -DIMHASH:
>>>>
>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=1800000000.000000
>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>>>> val=1800000000.000000
>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=1800000000.000000
>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=1800000000.000000
>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=1800000000.000000
>>>>
>>>>
>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>
>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>
>>>
>>>
>>> The paper obviously puts more effort on minimizing table size and not a
>>> fast
>>> lookup. My hunch is that our choice should be
>>>
>>> ((h>> ?table.r) ^ table.d[h& ?m2])& ?m1
>>>
>>>
>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>> double
>>> the number of bins than those 2 extra bits available for displacement
>>> options).
>>>
>>> Then keep incrementing the size of d and the number of table slots (in
>>> such
>>> an order that the total vtable size is minimized) until success. In
>>> practice
>>> this should almost always just increase the size of d, and keep the table
>>> size at the lowest 2**k that fits the slots (even for 64 methods or 128
>>> methods :-))
>>>
>>> Essentially we avoid the shift in the argument to d[] by making d larger.
>>
>>
>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>
>
> Well, table->d[const & const] compiles down to the same kind of code as
> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>
>
>> hopefully its size wouldn't be a big issue either. What kinds of
>> densities were you achieving?
>
>
> The algorithm is designed for 100% density in the table itself. (We can lift
> that to compensate for a small space of possible hash functions I guess.)
>
> I haven't done proper simulations yet, but I just tried |vtable|=128,
> |d|=128 from the command line and I had 15 successes or so before the first
> failure. That's with a 100% density in the vtable itself! (And when it
> fails, you increase |d| to get your success).
>
> The caveat is the space spent on d (it's small in comparison, but that's why
> this isn't too good to be true).
>
> A disadvantage might be that we may no longer have the opportunity to not
> make the table size a power of two (i.e. replace the mask with "if
> (likely(slot < n))"). I think for that to work one would need to replace the
> xor group with addition on Z_d.
>
>
>> Going back to the idea of linear probing on a cache miss, this has the
>> advantage that one can write a brain-dead provider that sets m=0 and
>> simply lists the methods instead of requiring a table optimizer. (Most
>> tools, of course, would do the table optimization.) It also lets you
>> get away with a "kind-of good" hash rather than requiring you search
>> until you find a (larger?) perfect one.
>
>
> Well, given that we can have 100% density, and generating the table is
> lightning fast, and the C code to generate the table is likely a 300 line
> utility... I'm not convinced.

It goes from an extraordinary simple spec (table is, at minimum, a
func[2^k] with a couple of extra zero fields, whose struct can be
statically defined in the source by hand) to a, well, not complicated
in the absolute sense, but much more so than the definition above. It
also is variable-size which makes allocating it globally/on a stack a
pain (I suppose one can choose an upper bound for |d| and |vtable|).

I am a bit playing devil's advocate here, it's probably just a (minor)
con, but worth noting at least.

> We should however make sure that *callers* can do a linear scan and use
> strcmp if they don't care about performance.

Yeah. That's easier to ensure ;).

- Robert

From dieter at handshake.de  Thu Jun  7 10:44:09 2012
From: dieter at handshake.de (Dieter Maurer)
Date: Thu, 7 Jun 2012 10:44:09 +0200
Subject: [Cython] Bug: bad C code generated for (some) "... and ... or ..."
	expressions
Message-ID: <20432.27097.470830.218794@localhost.localdomain>

"cython 0.13" generates bad C code for the attached "pyx" file.

"cython" itself recognizes that it did something wrong and emits "<error>;"
to the generated file:

...
static  __pyx_t_12cybug_and_or_pointer __pyx_f_12cybug_and_or_bug(PyObject *__pyx_v_o) {
  __pyx_t_12cybug_and_or_pointer __pyx_r;
  int __pyx_t_1;
  __pyx_t_12cybug_and_or_pointer __pyx_t_2;
  <error>;
  <error>;
...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cybug_and_or.pyx
Type: text/x-cython
Size: 164 bytes
Desc: "cython" source file triggering bad C code generation
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20120607/00c913e8/attachment.bin>
-------------- next part --------------


The error probably happens because it is difficult for "cython" to
determine the type for "and" and "or" expressions (if the operand types
differ). In an "cond and t or f" expression, however, the result type
is "type(t)" if "type(t) == type(f)", independent of "type(cond)".

It might not be worse to special case this type of expressions.
It would however be more friendly to output an instructive
error message instead of generating bad C code.

--
Dieter

From d.s.seljebotn at astro.uio.no  Thu Jun  7 12:20:59 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 07 Jun 2012 12:20:59 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
Message-ID: <4FD0808B.5080300@astro.uio.no>

On 06/07/2012 12:26 AM, Robert Bradshaw wrote:
> On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>>
>>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>
>>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>>
>>>>>
>>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>>
>>>>>>
>>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>>> interesting. Though the resulting hash functions are supposedly cheap,
>>>>>> I have the feeling that branching is considered cheap in this context.
>>>>>
>>>>>
>>>>>
>>>>> Actually, this lead was *very* promising. I believe the very first
>>>>> reference I actually read through and didn't eliminate after the
>>>>> abstract totally swept away our home-grown solutions!
>>>>>
>>>>> "Hash&    Displace" by Pagh (1999) is actually very simple, easy to
>>>>>
>>>>> understand, and fast both for generation and (the branch-free) lookup:
>>>>>
>>>>>
>>>>>
>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>>
>>>>>
>>>>> The idea is:
>>>>>
>>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
>>>>> requires b>    2n, though I think in practice you can often get away with
>>>>> less)
>>>>>
>>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>>> easily achieved since groups only has a few elements)
>>>>>
>>>>> - For each group, from largest to smallest: Find a displacement
>>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>>
>>>>> It requires extra storage for the displacement table. However, I think 8
>>>>> bits per element might suffice even for vtables of 512 or 1024 in size.
>>>>> Even with 16 bits it's rather negligible compared to the minimum-128-bit
>>>>> entries of the table.
>>>>>
>>>>> I benchmarked these hash functions:
>>>>>
>>>>> displace1: ((h>>    r1) ^ d[h&    63])&    m1
>>>>> displace2: ((h>>    r1) ^ d[h&    m2])&    m1
>>>>> displace3: ((h>>    r1) ^ d[(h>>    r2)&    m2])&    m1
>>>>>
>>>>>
>>>>> Only the third one is truly in the spirit of the algorithm, but I think
>>>>> the first two should work well too (and when h is known compile-time,
>>>>> looking up d[h&    63] isn't harder than looking up r1 or m1).
>>>>>
>>>>>
>>>>> My computer is acting up and all my numbers today are slower than the
>>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
>>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>>> compiled with -DIMHASH:
>>>>>
>>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
>>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=1800000000.000000
>>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>>>>> val=1800000000.000000
>>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=1800000000.000000
>>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=1800000000.000000
>>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=1800000000.000000
>>>>>
>>>>>
>>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>>
>>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>>
>>>>
>>>>
>>>> The paper obviously puts more effort on minimizing table size and not a
>>>> fast
>>>> lookup. My hunch is that our choice should be
>>>>
>>>> ((h>>    table.r) ^ table.d[h&    m2])&    m1
>>>>
>>>>
>>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>>> double
>>>> the number of bins than those 2 extra bits available for displacement
>>>> options).
>>>>
>>>> Then keep incrementing the size of d and the number of table slots (in
>>>> such
>>>> an order that the total vtable size is minimized) until success. In
>>>> practice
>>>> this should almost always just increase the size of d, and keep the table
>>>> size at the lowest 2**k that fits the slots (even for 64 methods or 128
>>>> methods :-))
>>>>
>>>> Essentially we avoid the shift in the argument to d[] by making d larger.
>>>
>>>
>>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>>
>>
>> Well, table->d[const&  const] compiles down to the same kind of code as
>> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>>
>>
>>> hopefully its size wouldn't be a big issue either. What kinds of
>>> densities were you achieving?

OK, simulation results just in (for the displace2 hash), and they 
exceeded my expectations.

I always fill the table with n=2^k keys, and fix b = n (b means |d|). 
Then the failure rates are (top two are 100,000 simulations, the rest 
are 1000 simulations):

n=   8   b=    8   failure-rate=0.0019   try-mean=4.40  try-max=65
n=  16   b=   16   failure-rate=0.0008   try-mean=5.02  try-max=65
n=  32   b=   32   failure-rate=0.0000   try-mean=5.67  try-max=25
n=  64   b=   64   failure-rate=0.0000   try-mean=6.60  try-max=29
n= 128   b=  128   failure-rate=0.0000   try-mean=7.64  try-max=22
n= 256   b=  256   failure-rate=0.0000   try-mean=8.66  try-max=37
n= 512   b=  512   failure-rate=0.0000   try-mean=9.57  try-max=26
n=1024   b= 1024   failure-rate=0.0000   try-mean=10.66  try-max=34

Try-mean and try-max is how many r's needed to be tried before success, 
so it gives an indication how much is left before failure.

For the ~1/1000 chance of failure for n=8 and n=16, we would proceed to 
let b=2*n (100,000 simulations):

n=   8   b=   16   failure-rate=0.0001   try-mean=2.43  try-max=65
n=  16   b=   32   failure-rate=0.0000   try-mean=3.40  try-max=65

NOTE: The 512...2048 results were with 16 bits displacements, with 8 bit 
displacements they mostly failed. So we either need to make each element 
of d 16 bits, or, e.g., store 512 entries in a 1024-slot table (which 
succeeded most of the time with 8 bit displacements). I'm +1 on 16 bits 
displacements.

The algorithm is rather fast and concise:

https://github.com/dagss/hashvtable/blob/master/pagh99.py

>> The algorithm is designed for 100% density in the table itself. (We can lift
>> that to compensate for a small space of possible hash functions I guess.)
>>
>> I haven't done proper simulations yet, but I just tried |vtable|=128,
>> |d|=128 from the command line and I had 15 successes or so before the first
>> failure. That's with a 100% density in the vtable itself! (And when it
>> fails, you increase |d| to get your success).
>>
>> The caveat is the space spent on d (it's small in comparison, but that's why
>> this isn't too good to be true).
>>
>> A disadvantage might be that we may no longer have the opportunity to not
>> make the table size a power of two (i.e. replace the mask with "if
>> (likely(slot<  n))"). I think for that to work one would need to replace the
>> xor group with addition on Z_d.
>>
>>
>>> Going back to the idea of linear probing on a cache miss, this has the
>>> advantage that one can write a brain-dead provider that sets m=0 and
>>> simply lists the methods instead of requiring a table optimizer. (Most
>>> tools, of course, would do the table optimization.) It also lets you
>>> get away with a "kind-of good" hash rather than requiring you search
>>> until you find a (larger?) perfect one.
>>
>>
>> Well, given that we can have 100% density, and generating the table is
>> lightning fast, and the C code to generate the table is likely a 300 line
>> utility... I'm not convinced.
>
> It goes from an extraordinary simple spec (table is, at minimum, a
> func[2^k] with a couple of extra zero fields, whose struct can be
> statically defined in the source by hand) to a, well, not complicated
> in the absolute sense, but much more so than the definition above. It
> also is variable-size which makes allocating it globally/on a stack a
> pain (I suppose one can choose an upper bound for |d| and |vtable|).
>
> I am a bit playing devil's advocate here, it's probably just a (minor)
> con, but worth noting at least.

If you were willing to go the interning route, so that you didn't need 
to fill the table with md5 hashes anyway, I'd say you'd have a stronger 
point :-)

Given the results above, static allocation can at least be solved in a 
way that is probably user-friendly enough:

PyHashVTable_16_16 mytable;

...init () {
     mytable.functions = { ... };
     if (PyHashVTable_Ready((PyHashVTable*)mytable, 16, 16) == -1) 
return -1;
}

Now, with chance ~1/1000, you're going to get an exception saying 
"Please try PyHashVTable_16_32". (And since that's deterministic given 
the function definitions you always catch it at once.)

Dag

From d.s.seljebotn at astro.uio.no  Thu Jun  7 12:35:37 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 07 Jun 2012 12:35:37 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD0808B.5080300@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no>
Message-ID: <4FD083F9.2030006@astro.uio.no>

On 06/07/2012 12:20 PM, Dag Sverre Seljebotn wrote:
> On 06/07/2012 12:26 AM, Robert Bradshaw wrote:
>> On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>>>
>>>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>>
>>>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>>>
>>>>>>
>>>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>>>
>>>>>>>
>>>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>>>> interesting. Though the resulting hash functions are supposedly
>>>>>>> cheap,
>>>>>>> I have the feeling that branching is considered cheap in this
>>>>>>> context.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actually, this lead was *very* promising. I believe the very first
>>>>>> reference I actually read through and didn't eliminate after the
>>>>>> abstract totally swept away our home-grown solutions!
>>>>>>
>>>>>> "Hash& Displace" by Pagh (1999) is actually very simple, easy to
>>>>>>
>>>>>> understand, and fast both for generation and (the branch-free)
>>>>>> lookup:
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>>>
>>>>>>
>>>>>>
>>>>>> The idea is:
>>>>>>
>>>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
>>>>>> requires b> 2n, though I think in practice you can often get away
>>>>>> with
>>>>>> less)
>>>>>>
>>>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>>>> easily achieved since groups only has a few elements)
>>>>>>
>>>>>> - For each group, from largest to smallest: Find a displacement
>>>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>>>
>>>>>> It requires extra storage for the displacement table. However, I
>>>>>> think 8
>>>>>> bits per element might suffice even for vtables of 512 or 1024 in
>>>>>> size.
>>>>>> Even with 16 bits it's rather negligible compared to the
>>>>>> minimum-128-bit
>>>>>> entries of the table.
>>>>>>
>>>>>> I benchmarked these hash functions:
>>>>>>
>>>>>> displace1: ((h>> r1) ^ d[h& 63])& m1
>>>>>> displace2: ((h>> r1) ^ d[h& m2])& m1
>>>>>> displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1
>>>>>>
>>>>>>
>>>>>> Only the third one is truly in the spirit of the algorithm, but I
>>>>>> think
>>>>>> the first two should work well too (and when h is known compile-time,
>>>>>> looking up d[h& 63] isn't harder than looking up r1 or m1).
>>>>>>
>>>>>>
>>>>>> My computer is acting up and all my numbers today are slower than the
>>>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
>>>>>> ago,
>>>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>>>> compiled with -DIMHASH:
>>>>>>
>>>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
>>>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
>>>>>> val=1800000000.000000
>>>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>>>>>> val=1800000000.000000
>>>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
>>>>>> val=1800000000.000000
>>>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
>>>>>> val=1800000000.000000
>>>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
>>>>>> val=1800000000.000000
>>>>>>
>>>>>>
>>>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>>>
>>>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>>>
>>>>>
>>>>>
>>>>> The paper obviously puts more effort on minimizing table size and
>>>>> not a
>>>>> fast
>>>>> lookup. My hunch is that our choice should be
>>>>>
>>>>> ((h>> table.r) ^ table.d[h& m2])& m1
>>>>>
>>>>>
>>>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>>>> double
>>>>> the number of bins than those 2 extra bits available for displacement
>>>>> options).
>>>>>
>>>>> Then keep incrementing the size of d and the number of table slots (in
>>>>> such
>>>>> an order that the total vtable size is minimized) until success. In
>>>>> practice
>>>>> this should almost always just increase the size of d, and keep the
>>>>> table
>>>>> size at the lowest 2**k that fits the slots (even for 64 methods or
>>>>> 128
>>>>> methods :-))
>>>>>
>>>>> Essentially we avoid the shift in the argument to d[] by making d
>>>>> larger.
>>>>
>>>>
>>>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>>>
>>>
>>> Well, table->d[const& const] compiles down to the same kind of code as
>>> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>>>
>>>
>>>> hopefully its size wouldn't be a big issue either. What kinds of
>>>> densities were you achieving?
>
> OK, simulation results just in (for the displace2 hash), and they
> exceeded my expectations.
>
> I always fill the table with n=2^k keys, and fix b = n (b means |d|).
> Then the failure rates are (top two are 100,000 simulations, the rest
> are 1000 simulations):
>
> n= 8 b= 8 failure-rate=0.0019 try-mean=4.40 try-max=65
> n= 16 b= 16 failure-rate=0.0008 try-mean=5.02 try-max=65
> n= 32 b= 32 failure-rate=0.0000 try-mean=5.67 try-max=25
> n= 64 b= 64 failure-rate=0.0000 try-mean=6.60 try-max=29
> n= 128 b= 128 failure-rate=0.0000 try-mean=7.64 try-max=22
> n= 256 b= 256 failure-rate=0.0000 try-mean=8.66 try-max=37
> n= 512 b= 512 failure-rate=0.0000 try-mean=9.57 try-max=26
> n=1024 b= 1024 failure-rate=0.0000 try-mean=10.66 try-max=34
>
> Try-mean and try-max is how many r's needed to be tried before success,
> so it gives an indication how much is left before failure.
>
> For the ~1/1000 chance of failure for n=8 and n=16, we would proceed to
> let b=2*n (100,000 simulations):
>
> n= 8 b= 16 failure-rate=0.0001 try-mean=2.43 try-max=65
> n= 16 b= 32 failure-rate=0.0000 try-mean=3.40 try-max=65
>
> NOTE: The 512...2048 results were with 16 bits displacements, with 8 bit
> displacements they mostly failed. So we either need to make each element
> of d 16 bits, or, e.g., store 512 entries in a 1024-slot table (which
> succeeded most of the time with 8 bit displacements). I'm +1 on 16 bits
> displacements.
>
> The algorithm is rather fast and concise:
>
> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>
>>> The algorithm is designed for 100% density in the table itself. (We
>>> can lift
>>> that to compensate for a small space of possible hash functions I
>>> guess.)
>>>
>>> I haven't done proper simulations yet, but I just tried |vtable|=128,
>>> |d|=128 from the command line and I had 15 successes or so before the
>>> first
>>> failure. That's with a 100% density in the vtable itself! (And when it
>>> fails, you increase |d| to get your success).
>>>
>>> The caveat is the space spent on d (it's small in comparison, but
>>> that's why
>>> this isn't too good to be true).
>>>
>>> A disadvantage might be that we may no longer have the opportunity to
>>> not
>>> make the table size a power of two (i.e. replace the mask with "if
>>> (likely(slot< n))"). I think for that to work one would need to
>>> replace the
>>> xor group with addition on Z_d.
>>>
>>>
>>>> Going back to the idea of linear probing on a cache miss, this has the
>>>> advantage that one can write a brain-dead provider that sets m=0 and
>>>> simply lists the methods instead of requiring a table optimizer. (Most
>>>> tools, of course, would do the table optimization.) It also lets you
>>>> get away with a "kind-of good" hash rather than requiring you search
>>>> until you find a (larger?) perfect one.
>>>
>>>
>>> Well, given that we can have 100% density, and generating the table is
>>> lightning fast, and the C code to generate the table is likely a 300
>>> line
>>> utility... I'm not convinced.
>>
>> It goes from an extraordinary simple spec (table is, at minimum, a
>> func[2^k] with a couple of extra zero fields, whose struct can be
>> statically defined in the source by hand) to a, well, not complicated
>> in the absolute sense, but much more so than the definition above. It
>> also is variable-size which makes allocating it globally/on a stack a
>> pain (I suppose one can choose an upper bound for |d| and |vtable|).
>>
>> I am a bit playing devil's advocate here, it's probably just a (minor)
>> con, but worth noting at least.
>
> If you were willing to go the interning route, so that you didn't need
> to fill the table with md5 hashes anyway, I'd say you'd have a stronger
> point :-)
>
> Given the results above, static allocation can at least be solved in a
> way that is probably user-friendly enough:
>
> PyHashVTable_16_16 mytable;
>
> ...init () {
> mytable.functions = { ... };
> if (PyHashVTable_Ready((PyHashVTable*)mytable, 16, 16) == -1) return -1;
> }
>
> Now, with chance ~1/1000, you're going to get an exception saying
> "Please try PyHashVTable_16_32". (And since that's deterministic given
> the function definitions you always catch it at once.)

PS. PyHashVTable_Ready would do the md5's and reorder the functions etc. 
as well.

Dag

From d.s.seljebotn at astro.uio.no  Thu Jun  7 12:45:52 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 07 Jun 2012 12:45:52 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QCs_yU7ebz_7HNPNEw=p7uTSu7Eh+L5LELDcOirfp8X=A@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<4FCE679C.7000002@astro.uio.no>
	<CADiQ+QA+3xmoK-K6X44jyLFyUadgSZV45d4b68bmJ=UeLjmgmg@mail.gmail.com>
	<4FCE7CFB.7000205@astro.uio.no>
	<CADiQ+QCs_yU7ebz_7HNPNEw=p7uTSu7Eh+L5LELDcOirfp8X=A@mail.gmail.com>
Message-ID: <4FD08660.9080104@astro.uio.no>

On 06/06/2012 11:00 PM, Robert Bradshaw wrote:
> On Tue, Jun 5, 2012 at 2:41 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> Is the goal then to avoid having to have an interning registry?
>
> Yes, and to avoid invoking an expensive hash function at runtime in
> order to achieve good distribution.

I don't understand. Compilation of call-sites would always generate a 
hash. You also need them while initializing/composing the hash table.

But the storage and comparison of the hash rather than and interned 
string seems orthogonal to that.

If it weren't for the security consern I agree with you. But I think 
Mark and Stefan makes a good point. Since you could hand a JIT-ed vtable 
(potentially the result of "trusted and verified user input") to a 
Cython function, *all* call-sites should use the full 160 bits.

Interning solves this in a better way, and preserves vtable memory to boot.

A collision registry would work against a security breach but still 
allow a DoS attack.

Our dependencies are already:

  - md5
  - Pagh99 algorithm

Why not throw in an interning registry as well ;-)

But then the end-result is pretty cool.

>> Something that hasn't come up so far is that Cython doesn't know the exact
>> types of external typedefs, so it can't generate the hash at Cythonize-time.
>> I guess some support for build systems to probe for type sizes and compute
>> the signature hashes in a sepearate header file would solve this -- with a
>> fallback to computing them runtime at module loading, if you're not using a
>> supported build system. (But suddenly an interning registry doesn't look so
>> horrible..)
>
> It all depends on how strict you want to be. It may be acceptable to
> let f(int) and f(long) not hash to the same value even if sizeof(int)
> == sizeof(long). We could also promote all int types to long or long
> long, including extern times (assuming, with a c-compile-time check,
> external types declared up to "long" are<= sizeof(long)). Another

Please no, I don't like any of those. We should not make the trouble 
with external typedefs worse than it already is. (Part of me wants to 
just declare that Cython is like Go with no implicit conversions to 
aovid inheriting the ugly coercion rules of C anyway...)

> option is to let the hash be md5(sig) + hashN(sizeof(extern_arg1),
> sizeof(extern_argN)) where hashN is a macro.

Good idea. Would the following destroy all the nice properties of md5? I 
guess I wouldn't use it for crypto any longer...:

hash("mymethod:iiZd") =
md5("mymethod") ^ md5("i\x1") ^ md5("i\x2") ^ md5("Z\x3") ^ md5("d\x4")

Dag

From d.s.seljebotn at astro.uio.no  Thu Jun  7 12:47:39 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 07 Jun 2012 12:47:39 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD08660.9080104@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<4FCE679C.7000002@astro.uio.no>
	<CADiQ+QA+3xmoK-K6X44jyLFyUadgSZV45d4b68bmJ=UeLjmgmg@mail.gmail.com>
	<4FCE7CFB.7000205@astro.uio.no>
	<CADiQ+QCs_yU7ebz_7HNPNEw=p7uTSu7Eh+L5LELDcOirfp8X=A@mail.gmail.com>
	<4FD08660.9080104@astro.uio.no>
Message-ID: <4FD086CB.5090201@astro.uio.no>

On 06/07/2012 12:45 PM, Dag Sverre Seljebotn wrote:
> On 06/06/2012 11:00 PM, Robert Bradshaw wrote:
>> On Tue, Jun 5, 2012 at 2:41 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> Is the goal then to avoid having to have an interning registry?
>>
>> Yes, and to avoid invoking an expensive hash function at runtime in
>> order to achieve good distribution.
>
> I don't understand. Compilation of call-sites would always generate a
> hash. You also need them while initializing/composing the hash table.
>
> But the storage and comparison of the hash rather than and interned
> string seems orthogonal to that.
>
> If it weren't for the security consern I agree with you. But I think
> Mark and Stefan makes a good point. Since you could hand a JIT-ed vtable
> (potentially the result of "trusted and verified user input") to a
> Cython function, *all* call-sites should use the full 160 bits.
>
> Interning solves this in a better way, and preserves vtable memory to boot.

No, it's not necesarrily *better* -- I meant, it's going to be faster 
than the 160 bit compare.

And I think throwing in a user option that anybody actually needs to 
care about would be a failure here.

Dag

>
> A collision registry would work against a security breach but still
> allow a DoS attack.
>
> Our dependencies are already:
>
> - md5
> - Pagh99 algorithm
>
> Why not throw in an interning registry as well ;-)
>
> But then the end-result is pretty cool.
>
>>> Something that hasn't come up so far is that Cython doesn't know the
>>> exact
>>> types of external typedefs, so it can't generate the hash at
>>> Cythonize-time.
>>> I guess some support for build systems to probe for type sizes and
>>> compute
>>> the signature hashes in a sepearate header file would solve this --
>>> with a
>>> fallback to computing them runtime at module loading, if you're not
>>> using a
>>> supported build system. (But suddenly an interning registry doesn't
>>> look so
>>> horrible..)
>>
>> It all depends on how strict you want to be. It may be acceptable to
>> let f(int) and f(long) not hash to the same value even if sizeof(int)
>> == sizeof(long). We could also promote all int types to long or long
>> long, including extern times (assuming, with a c-compile-time check,
>> external types declared up to "long" are<= sizeof(long)). Another
>
> Please no, I don't like any of those. We should not make the trouble
> with external typedefs worse than it already is. (Part of me wants to
> just declare that Cython is like Go with no implicit conversions to
> aovid inheriting the ugly coercion rules of C anyway...)
>
>> option is to let the hash be md5(sig) + hashN(sizeof(extern_arg1),
>> sizeof(extern_argN)) where hashN is a macro.
>
> Good idea. Would the following destroy all the nice properties of md5? I
> guess I wouldn't use it for crypto any longer...:
>
> hash("mymethod:iiZd") =
> md5("mymethod") ^ md5("i\x1") ^ md5("i\x2") ^ md5("Z\x3") ^ md5("d\x4")
>
> Dag


From dieter at handshake.de  Thu Jun  7 13:32:43 2012
From: dieter at handshake.de (Dieter Maurer)
Date: Thu, 7 Jun 2012 13:32:43 +0200
Subject: [Cython] Why does "__cinit__" insists on converting its arguments
	to Python objects?
Message-ID: <20432.37211.743043.567461@localhost.localdomain>

The following cython source leads to a "Cannot convert 'pointer' to Python object".

ctypedef void * pointer

cdef extern from "nonexistant.h":
  cdef pointer to_pointer(object)

cdef class C:
  cdef pointer p

  def __cinit__(self, pointer p): self.p = p

c = C(to_pointer(None))

Why does the constructor call tries an implicit conversion to a
Python object even though it gets precisely the type indicated by
its signature?


I am working on a binding for "libxmlsec". The behaviour above leads
to an unnatural mapping. Two examples:

1. "libxmlsec" has the concept of a key (used for digital signatures or
   encryption), naturally mapped onto a "cdef class Key" encapsulating
   the xmlsec key pointer.

   "libxmlsec" provides many functions to create keys - naturally mapped
   onto class methods used as alternative constructors.
   Would "Cython" allow C level parameters for "__cinit__",
   they could look like:

     cdef xmlSecKeyPtr xkey = ... some "libxmlsec" key generating function ...
     return Key(xkey)

   With the restriction, this must look like:

     cdef Key key
     key.xkey = ... some "libxmlsec" key generating function ...
     return key

   Not yet too bad, unless the constructor requires C level arguments.

2. "libxmlsec" provides a whole bunch of transforms, handled in C code
   via a set of so called "TransformId"s. Each "TransformId" is
   generated by a function.

   The natural way would like:

      cdef class TransformId:
        cdef xmlSecTransformId tid
	def __cinit__(self, xmlSecTransformId tid): self.tid = tid

      TransformInclC14N = TransformId(xmlSecTransformInclC14NGetKlass())
      ... for all standard transforms ...

   The restriction forces the introduction of a helper function:

      cdef class TransformId:
        cdef xmlSecTransformId tid

      cdef _mkti(xmlSecTransformId tid):
        cdef TransformId t = TransformId()
        t.tid = tid
        return t

      TransformInclC14N = _mkti(xmlSecTransformInclC14NGetKlass())
      ... for all standard transforms ...


--
Dieter

From d.s.seljebotn at astro.uio.no  Thu Jun  7 14:24:32 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 07 Jun 2012 14:24:32 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD0808B.5080300@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no>
Message-ID: <4FD09D80.7020601@astro.uio.no>

On 06/07/2012 12:20 PM, Dag Sverre Seljebotn wrote:
> On 06/07/2012 12:26 AM, Robert Bradshaw wrote:
>> On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>>>
>>>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>>
>>>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>>>
>>>>>>
>>>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>>>
>>>>>>>
>>>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>>>> interesting. Though the resulting hash functions are supposedly
>>>>>>> cheap,
>>>>>>> I have the feeling that branching is considered cheap in this
>>>>>>> context.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actually, this lead was *very* promising. I believe the very first
>>>>>> reference I actually read through and didn't eliminate after the
>>>>>> abstract totally swept away our home-grown solutions!
>>>>>>
>>>>>> "Hash& Displace" by Pagh (1999) is actually very simple, easy to
>>>>>>
>>>>>> understand, and fast both for generation and (the branch-free)
>>>>>> lookup:
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>>>
>>>>>>
>>>>>>
>>>>>> The idea is:
>>>>>>
>>>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
>>>>>> requires b> 2n, though I think in practice you can often get away
>>>>>> with
>>>>>> less)
>>>>>>
>>>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>>>> easily achieved since groups only has a few elements)
>>>>>>
>>>>>> - For each group, from largest to smallest: Find a displacement
>>>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>>>
>>>>>> It requires extra storage for the displacement table. However, I
>>>>>> think 8
>>>>>> bits per element might suffice even for vtables of 512 or 1024 in
>>>>>> size.
>>>>>> Even with 16 bits it's rather negligible compared to the
>>>>>> minimum-128-bit
>>>>>> entries of the table.
>>>>>>
>>>>>> I benchmarked these hash functions:
>>>>>>
>>>>>> displace1: ((h>> r1) ^ d[h& 63])& m1
>>>>>> displace2: ((h>> r1) ^ d[h& m2])& m1
>>>>>> displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1
>>>>>>
>>>>>>
>>>>>> Only the third one is truly in the spirit of the algorithm, but I
>>>>>> think
>>>>>> the first two should work well too (and when h is known compile-time,
>>>>>> looking up d[h& 63] isn't harder than looking up r1 or m1).
>>>>>>
>>>>>>
>>>>>> My computer is acting up and all my numbers today are slower than the
>>>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
>>>>>> ago,
>>>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>>>> compiled with -DIMHASH:
>>>>>>
>>>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=2400000000.000000
>>>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
>>>>>> val=1800000000.000000
>>>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>>>>>> val=1800000000.000000
>>>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
>>>>>> val=1800000000.000000
>>>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
>>>>>> val=1800000000.000000
>>>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
>>>>>> val=1800000000.000000
>>>>>>
>>>>>>
>>>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>>>
>>>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>>>
>>>>>
>>>>>
>>>>> The paper obviously puts more effort on minimizing table size and
>>>>> not a
>>>>> fast
>>>>> lookup. My hunch is that our choice should be
>>>>>
>>>>> ((h>> table.r) ^ table.d[h& m2])& m1
>>>>>
>>>>>
>>>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>>>> double
>>>>> the number of bins than those 2 extra bits available for displacement
>>>>> options).
>>>>>
>>>>> Then keep incrementing the size of d and the number of table slots (in
>>>>> such
>>>>> an order that the total vtable size is minimized) until success. In
>>>>> practice
>>>>> this should almost always just increase the size of d, and keep the
>>>>> table
>>>>> size at the lowest 2**k that fits the slots (even for 64 methods or
>>>>> 128
>>>>> methods :-))
>>>>>
>>>>> Essentially we avoid the shift in the argument to d[] by making d
>>>>> larger.
>>>>
>>>>
>>>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>>>
>>>
>>> Well, table->d[const& const] compiles down to the same kind of code as
>>> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>>>
>>>
>>>> hopefully its size wouldn't be a big issue either. What kinds of
>>>> densities were you achieving?
>
> OK, simulation results just in (for the displace2 hash), and they
> exceeded my expectations.
>
> I always fill the table with n=2^k keys, and fix b = n (b means |d|).
> Then the failure rates are (top two are 100,000 simulations, the rest
> are 1000 simulations):
>
> n= 8 b= 8 failure-rate=0.0019 try-mean=4.40 try-max=65
> n= 16 b= 16 failure-rate=0.0008 try-mean=5.02 try-max=65
> n= 32 b= 32 failure-rate=0.0000 try-mean=5.67 try-max=25
> n= 64 b= 64 failure-rate=0.0000 try-mean=6.60 try-max=29
> n= 128 b= 128 failure-rate=0.0000 try-mean=7.64 try-max=22
> n= 256 b= 256 failure-rate=0.0000 try-mean=8.66 try-max=37
> n= 512 b= 512 failure-rate=0.0000 try-mean=9.57 try-max=26
> n=1024 b= 1024 failure-rate=0.0000 try-mean=10.66 try-max=34
>
> Try-mean and try-max is how many r's needed to be tried before success,
> so it gives an indication how much is left before failure.
>
> For the ~1/1000 chance of failure for n=8 and n=16, we would proceed to
> let b=2*n (100,000 simulations):
>
> n= 8 b= 16 failure-rate=0.0001 try-mean=2.43 try-max=65
> n= 16 b= 32 failure-rate=0.0000 try-mean=3.40 try-max=65
>
> NOTE: The 512...2048 results were with 16 bits displacements, with 8 bit
> displacements they mostly failed. So we either need to make each element
> of d 16 bits, or, e.g., store 512 entries in a 1024-slot table (which
> succeeded most of the time with 8 bit displacements). I'm +1 on 16 bits
> displacements.
>
> The algorithm is rather fast and concise:
>
> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>
>>> The algorithm is designed for 100% density in the table itself. (We
>>> can lift
>>> that to compensate for a small space of possible hash functions I
>>> guess.)
>>>
>>> I haven't done proper simulations yet, but I just tried |vtable|=128,
>>> |d|=128 from the command line and I had 15 successes or so before the
>>> first
>>> failure. That's with a 100% density in the vtable itself! (And when it
>>> fails, you increase |d| to get your success).
>>>
>>> The caveat is the space spent on d (it's small in comparison, but
>>> that's why
>>> this isn't too good to be true).
>>>
>>> A disadvantage might be that we may no longer have the opportunity to
>>> not
>>> make the table size a power of two (i.e. replace the mask with "if
>>> (likely(slot< n))"). I think for that to work one would need to
>>> replace the
>>> xor group with addition on Z_d.
>>>
>>>
>>>> Going back to the idea of linear probing on a cache miss, this has the
>>>> advantage that one can write a brain-dead provider that sets m=0 and
>>>> simply lists the methods instead of requiring a table optimizer. (Most
>>>> tools, of course, would do the table optimization.) It also lets you
>>>> get away with a "kind-of good" hash rather than requiring you search
>>>> until you find a (larger?) perfect one.
>>>
>>>
>>> Well, given that we can have 100% density, and generating the table is
>>> lightning fast, and the C code to generate the table is likely a 300
>>> line
>>> utility... I'm not convinced.
>>
>> It goes from an extraordinary simple spec (table is, at minimum, a
>> func[2^k] with a couple of extra zero fields, whose struct can be
>> statically defined in the source by hand) to a, well, not complicated
>> in the absolute sense, but much more so than the definition above. It
>> also is variable-size which makes allocating it globally/on a stack a
>> pain (I suppose one can choose an upper bound for |d| and |vtable|).
>>
>> I am a bit playing devil's advocate here, it's probably just a (minor)
>> con, but worth noting at least.
>
> If you were willing to go the interning route, so that you didn't need
> to fill the table with md5 hashes anyway, I'd say you'd have a stronger
> point :-)

Here's a good reason to demand perfect hashing in the callee:

Suppose you want to first check the interface once, then keep using the 
vtable -- e.g, *if* we want Cython to raise TypeError on the interface 
coercion *and* we decide we don't want to mess with constructing 
C++-style vtables on the fly, then code like this:

cdef f(SomeInterface obj):
     return obj.some_method(1.0)

would simply expect that the vtable contained the method, and skip the 
ID comparison entirely. No comparison is faster than either 64-bit hash 
comparison and interned comparison. :-)

I'm not saying the above decisions must be made, but the possibility 
seems reason enough to demand perfect hashing.

Dag

From robertwb at gmail.com  Thu Jun  7 20:00:42 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Thu, 7 Jun 2012 11:00:42 -0700
Subject: [Cython] Why does "__cinit__" insists on converting its
 arguments to Python objects?
In-Reply-To: <20432.37211.743043.567461@localhost.localdomain>
References: <20432.37211.743043.567461@localhost.localdomain>
Message-ID: <CADiQ+QDUPRXoeVBD5LUFopDwJY_T4vXNKZhmyuSJ5ysWFKn3SA@mail.gmail.com>

Both __init__ and __cinit__ are passed the same arguments, and the
former has Python calling conventions. (Also, we use Python's
framework to allocate and construct the new object, so there's not a
huge amount of flexibility here and working around this would be quite
non-trivial).

On Thu, Jun 7, 2012 at 4:32 AM, Dieter Maurer <dieter at handshake.de> wrote:
> The following cython source leads to a "Cannot convert 'pointer' to Python object".
>
> ctypedef void * pointer
>
> cdef extern from "nonexistant.h":
> ?cdef pointer to_pointer(object)
>
> cdef class C:
> ?cdef pointer p
>
> ?def __cinit__(self, pointer p): self.p = p
>
> c = C(to_pointer(None))
>
> Why does the constructor call tries an implicit conversion to a
> Python object even though it gets precisely the type indicated by
> its signature?
>
>
> I am working on a binding for "libxmlsec". The behaviour above leads
> to an unnatural mapping. Two examples:
>
> 1. "libxmlsec" has the concept of a key (used for digital signatures or
> ? encryption), naturally mapped onto a "cdef class Key" encapsulating
> ? the xmlsec key pointer.
>
> ? "libxmlsec" provides many functions to create keys - naturally mapped
> ? onto class methods used as alternative constructors.
> ? Would "Cython" allow C level parameters for "__cinit__",
> ? they could look like:
>
> ? ? cdef xmlSecKeyPtr xkey = ... some "libxmlsec" key generating function ...
> ? ? return Key(xkey)
>
> ? With the restriction, this must look like:
>
> ? ? cdef Key key
> ? ? key.xkey = ... some "libxmlsec" key generating function ...
> ? ? return key
>
> ? Not yet too bad, unless the constructor requires C level arguments.
>
> 2. "libxmlsec" provides a whole bunch of transforms, handled in C code
> ? via a set of so called "TransformId"s. Each "TransformId" is
> ? generated by a function.
>
> ? The natural way would like:
>
> ? ? ?cdef class TransformId:
> ? ? ? ?cdef xmlSecTransformId tid
> ? ? ? ?def __cinit__(self, xmlSecTransformId tid): self.tid = tid
>
> ? ? ?TransformInclC14N = TransformId(xmlSecTransformInclC14NGetKlass())
> ? ? ?... for all standard transforms ...
>
> ? The restriction forces the introduction of a helper function:
>
> ? ? ?cdef class TransformId:
> ? ? ? ?cdef xmlSecTransformId tid
>
> ? ? ?cdef _mkti(xmlSecTransformId tid):
> ? ? ? ?cdef TransformId t = TransformId()
> ? ? ? ?t.tid = tid
> ? ? ? ?return t
>
> ? ? ?TransformInclC14N = _mkti(xmlSecTransformInclC14NGetKlass())
> ? ? ?... for all standard transforms ...
>
>
>
>
>
> --
> Dieter
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Fri Jun  8 08:50:47 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 08 Jun 2012 08:50:47 +0200
Subject: [Cython] Bug: bad C code generated for (some) "... and ... or
 ..." expressions
In-Reply-To: <20432.27097.470830.218794@localhost.localdomain>
References: <20432.27097.470830.218794@localhost.localdomain>
Message-ID: <4FD1A0C7.7080903@behnel.de>

Hi,

thanks for the report.

Dieter Maurer, 07.06.2012 10:44:
> "cython 0.13" generates bad C code for the attached "pyx" file.

Could you try the latest release? I would at least expect an error instead
of actually generating code.


> "cython" itself recognizes that it did something wrong and emits "<error>;"
> to the generated file:
> 
> ...
> static  __pyx_t_12cybug_and_or_pointer __pyx_f_12cybug_and_or_bug(PyObject *__pyx_v_o) {
>   __pyx_t_12cybug_and_or_pointer __pyx_r;
>   int __pyx_t_1;
>   __pyx_t_12cybug_and_or_pointer __pyx_t_2;
>   <error>;
>   <error>;
> ...
> 

This is generated from this Cython code:

> cdef pointer bug(o):
>   return o is not None and to_pointer(o) or NULL

The right way to implement this is:

   return to_pointer(o) if o is not None else NULL


> The error probably happens because it is difficult for "cython" to
> determine the type for "and" and "or" expressions (if the operand types
> differ). In an "cond and t or f" expression, however, the result type
> is "type(t)" if "type(t) == type(f)", independent of "type(cond)".

Independent of the condition, yes. However, the types of the two expression
results differ here, and the fact that you named your initial condition
"cond" just hides the fact that it is not different from the other two
parts (t and f) of the expression. The Python semantics of this kind of
evaluation is more complex than you might think.


> It might not be worse to special case this type of expressions.

-1


> It would however be more friendly to output an instructive
> error message instead of generating bad C code.

Absolutely.

Stefan

From dieter at handshake.de  Fri Jun  8 13:38:22 2012
From: dieter at handshake.de (Dieter Maurer)
Date: Fri, 8 Jun 2012 13:38:22 +0200
Subject: [Cython] Bug: bad C code generated for (some) "... and ...
	or	..." expressions
In-Reply-To: <4FD1A0C7.7080903@behnel.de>
References: <20432.27097.470830.218794@localhost.localdomain>
	<4FD1A0C7.7080903@behnel.de>
Message-ID: <20433.58414.162418.381590@localhost.localdomain>

Stefan Behnel wrote at 2012-6-8 08:50 +0200:
>thanks for the report.
>
>Dieter Maurer, 07.06.2012 10:44:
>> "cython 0.13" generates bad C code for the attached "pyx" file.
>
>Could you try the latest release? I would at least expect an error instead
>of actually generating code.

The latest release on PyPI is "0.16". It behaves identical to version
"0.13": no error message; just wrongly generated C code (C code
containing "<error>;" "statements".


--
Dieter

From d.s.seljebotn at astro.uio.no  Fri Jun  8 23:12:58 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 08 Jun 2012 23:12:58 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD083F9.2030006@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
Message-ID: <4FD26ADA.5060401@astro.uio.no>

On 06/07/2012 12:35 PM, Dag Sverre Seljebotn wrote:
> On 06/07/2012 12:20 PM, Dag Sverre Seljebotn wrote:
>> On 06/07/2012 12:26 AM, Robert Bradshaw wrote:
>>> On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>>>>
>>>>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>>>
>>>>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>>>>> interesting. Though the resulting hash functions are supposedly
>>>>>>>> cheap,
>>>>>>>> I have the feeling that branching is considered cheap in this
>>>>>>>> context.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Actually, this lead was *very* promising. I believe the very first
>>>>>>> reference I actually read through and didn't eliminate after the
>>>>>>> abstract totally swept away our home-grown solutions!
>>>>>>>
>>>>>>> "Hash& Displace" by Pagh (1999) is actually very simple, easy to
>>>>>>>
>>>>>>> understand, and fast both for generation and (the branch-free)
>>>>>>> lookup:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The idea is:
>>>>>>>
>>>>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the
>>>>>>> paper
>>>>>>> requires b> 2n, though I think in practice you can often get away
>>>>>>> with
>>>>>>> less)
>>>>>>>
>>>>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>>>>> easily achieved since groups only has a few elements)
>>>>>>>
>>>>>>> - For each group, from largest to smallest: Find a displacement
>>>>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>>>>
>>>>>>> It requires extra storage for the displacement table. However, I
>>>>>>> think 8
>>>>>>> bits per element might suffice even for vtables of 512 or 1024 in
>>>>>>> size.
>>>>>>> Even with 16 bits it's rather negligible compared to the
>>>>>>> minimum-128-bit
>>>>>>> entries of the table.
>>>>>>>
>>>>>>> I benchmarked these hash functions:
>>>>>>>
>>>>>>> displace1: ((h>> r1) ^ d[h& 63])& m1
>>>>>>> displace2: ((h>> r1) ^ d[h& m2])& m1
>>>>>>> displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1
>>>>>>>
>>>>>>>
>>>>>>> Only the third one is truly in the spirit of the algorithm, but I
>>>>>>> think
>>>>>>> the first two should work well too (and when h is known
>>>>>>> compile-time,
>>>>>>> looking up d[h& 63] isn't harder than looking up r1 or m1).
>>>>>>>
>>>>>>>
>>>>>>> My computer is acting up and all my numbers today are slower than
>>>>>>> the
>>>>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
>>>>>>> ago,
>>>>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>>>>> compiled with -DIMHASH:
>>>>>>>
>>>>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11
>>>>>>> val=2400000000.000000
>>>>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>>>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
>>>>>>> val=1800000000.000000
>>>>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>>>>>>> val=1800000000.000000
>>>>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
>>>>>>> val=1800000000.000000
>>>>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
>>>>>>> val=1800000000.000000
>>>>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
>>>>>>> val=1800000000.000000
>>>>>>>
>>>>>>>
>>>>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>>>>
>>>>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>>>>
>>>>>>
>>>>>>
>>>>>> The paper obviously puts more effort on minimizing table size and
>>>>>> not a
>>>>>> fast
>>>>>> lookup. My hunch is that our choice should be
>>>>>>
>>>>>> ((h>> table.r) ^ table.d[h& m2])& m1
>>>>>>
>>>>>>
>>>>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>>>>> double
>>>>>> the number of bins than those 2 extra bits available for displacement
>>>>>> options).
>>>>>>
>>>>>> Then keep incrementing the size of d and the number of table slots
>>>>>> (in
>>>>>> such
>>>>>> an order that the total vtable size is minimized) until success. In
>>>>>> practice
>>>>>> this should almost always just increase the size of d, and keep the
>>>>>> table
>>>>>> size at the lowest 2**k that fits the slots (even for 64 methods or
>>>>>> 128
>>>>>> methods :-))
>>>>>>
>>>>>> Essentially we avoid the shift in the argument to d[] by making d
>>>>>> larger.
>>>>>
>>>>>
>>>>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>>>>
>>>>
>>>> Well, table->d[const& const] compiles down to the same kind of code as
>>>> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>>>>
>>>>
>>>>> hopefully its size wouldn't be a big issue either. What kinds of
>>>>> densities were you achieving?
>>
>> OK, simulation results just in (for the displace2 hash), and they
>> exceeded my expectations.
>>
>> I always fill the table with n=2^k keys, and fix b = n (b means |d|).
>> Then the failure rates are (top two are 100,000 simulations, the rest
>> are 1000 simulations):
>>
>> n= 8 b= 8 failure-rate=0.0019 try-mean=4.40 try-max=65
>> n= 16 b= 16 failure-rate=0.0008 try-mean=5.02 try-max=65
>> n= 32 b= 32 failure-rate=0.0000 try-mean=5.67 try-max=25
>> n= 64 b= 64 failure-rate=0.0000 try-mean=6.60 try-max=29
>> n= 128 b= 128 failure-rate=0.0000 try-mean=7.64 try-max=22
>> n= 256 b= 256 failure-rate=0.0000 try-mean=8.66 try-max=37
>> n= 512 b= 512 failure-rate=0.0000 try-mean=9.57 try-max=26
>> n=1024 b= 1024 failure-rate=0.0000 try-mean=10.66 try-max=34
>>
>> Try-mean and try-max is how many r's needed to be tried before success,
>> so it gives an indication how much is left before failure.
>>
>> For the ~1/1000 chance of failure for n=8 and n=16, we would proceed to
>> let b=2*n (100,000 simulations):
>>
>> n= 8 b= 16 failure-rate=0.0001 try-mean=2.43 try-max=65
>> n= 16 b= 32 failure-rate=0.0000 try-mean=3.40 try-max=65
>>
>> NOTE: The 512...2048 results were with 16 bits displacements, with 8 bit
>> displacements they mostly failed. So we either need to make each element
>> of d 16 bits, or, e.g., store 512 entries in a 1024-slot table (which
>> succeeded most of the time with 8 bit displacements). I'm +1 on 16 bits
>> displacements.
>>
>> The algorithm is rather fast and concise:
>>
>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>
>>>> The algorithm is designed for 100% density in the table itself. (We
>>>> can lift
>>>> that to compensate for a small space of possible hash functions I
>>>> guess.)
>>>>
>>>> I haven't done proper simulations yet, but I just tried |vtable|=128,
>>>> |d|=128 from the command line and I had 15 successes or so before the
>>>> first
>>>> failure. That's with a 100% density in the vtable itself! (And when it
>>>> fails, you increase |d| to get your success).
>>>>
>>>> The caveat is the space spent on d (it's small in comparison, but
>>>> that's why
>>>> this isn't too good to be true).
>>>>
>>>> A disadvantage might be that we may no longer have the opportunity to
>>>> not
>>>> make the table size a power of two (i.e. replace the mask with "if
>>>> (likely(slot< n))"). I think for that to work one would need to
>>>> replace the
>>>> xor group with addition on Z_d.
>>>>
>>>>
>>>>> Going back to the idea of linear probing on a cache miss, this has the
>>>>> advantage that one can write a brain-dead provider that sets m=0 and
>>>>> simply lists the methods instead of requiring a table optimizer. (Most
>>>>> tools, of course, would do the table optimization.) It also lets you
>>>>> get away with a "kind-of good" hash rather than requiring you search
>>>>> until you find a (larger?) perfect one.
>>>>
>>>>
>>>> Well, given that we can have 100% density, and generating the table is
>>>> lightning fast, and the C code to generate the table is likely a 300
>>>> line
>>>> utility... I'm not convinced.
>>>
>>> It goes from an extraordinary simple spec (table is, at minimum, a
>>> func[2^k] with a couple of extra zero fields, whose struct can be
>>> statically defined in the source by hand) to a, well, not complicated
>>> in the absolute sense, but much more so than the definition above. It
>>> also is variable-size which makes allocating it globally/on a stack a
>>> pain (I suppose one can choose an upper bound for |d| and |vtable|).
>>>
>>> I am a bit playing devil's advocate here, it's probably just a (minor)
>>> con, but worth noting at least.
>>
>> If you were willing to go the interning route, so that you didn't need
>> to fill the table with md5 hashes anyway, I'd say you'd have a stronger
>> point :-)
>>
>> Given the results above, static allocation can at least be solved in a
>> way that is probably user-friendly enough:
>>
>> PyHashVTable_16_16 mytable;
>>
>> ...init () {
>> mytable.functions = { ... };
>> if (PyHashVTable_Ready((PyHashVTable*)mytable, 16, 16) == -1) return -1;
>> }
>>
>> Now, with chance ~1/1000, you're going to get an exception saying
>> "Please try PyHashVTable_16_32". (And since that's deterministic given
>> the function definitions you always catch it at once.)
>
> PS. PyHashVTable_Ready would do the md5's and reorder the functions etc.
> as well.


There's still the indirection through SEP 200 (extensibletype slots). We 
can get rid of that very easily by just making that table and the 
hash-vtable one and the same. (It could still either have interned 
string keys or ID keys depending on the least significant bit.)

To wrap up, I think this has grown in complexity beyond the "simple SEP 
spec". It's at the point where you don't really want to have several 
libraries implementing the same simple spec, but instead use the same 
implementation.

But I think the advantages are simply too good to give up on.

So I think a viable route forward is to forget the 
CEP/SEP/pre-PEP-approach for now (which only works for semi-complicated 
ideas with simple implementations) and instead simply work more directly 
on a library. It would need to have a couple of different use modes:

  - A Python perfect-hasher for use when generating code, with only the 
a string interner based on CPython dicts and extensibletype metaclass as 
runtime dependencies (for use in Cython). This would only add some 
hundred source file lines...

  - A C implementation of the perfect hashing exposed through a 
PyPerfectHashTable_Ready(), for use in libraries written in C like 
NumPy/SciPy). This would need to bundle the md5 algorithm and a C 
implementation of the perfect hashing.

And on the distribution axis:

  - Small C header-style implementation of a string interner and the 
extensibletype metaclass, rendezvousing through sys.modules

  - As part of the rendezvous, one would always try to __import__ the 
*real* run-time library. So if it is available in sys.path it overrides 
anything bundled with other libraries. That would provide a way forward 
for GIL-less string interning, a Python-side library for working with 
these tables and inspecting them, etc.

Time to stop talking and start coding...

Dag

From robertwb at gmail.com  Sat Jun  9 03:21:23 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Fri, 8 Jun 2012 18:21:23 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD26ADA.5060401@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
Message-ID: <CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>

On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/07/2012 12:35 PM, Dag Sverre Seljebotn wrote:
>>
>> On 06/07/2012 12:20 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 06/07/2012 12:26 AM, Robert Bradshaw wrote:
>>>>
>>>> On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>>
>>>>> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>>>>>> interesting. Though the resulting hash functions are supposedly
>>>>>>>>> cheap,
>>>>>>>>> I have the feeling that branching is considered cheap in this
>>>>>>>>> context.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, this lead was *very* promising. I believe the very first
>>>>>>>> reference I actually read through and didn't eliminate after the
>>>>>>>> abstract totally swept away our home-grown solutions!
>>>>>>>>
>>>>>>>> "Hash& Displace" by Pagh (1999) is actually very simple, easy to
>>>>>>>>
>>>>>>>> understand, and fast both for generation and (the branch-free)
>>>>>>>> lookup:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The idea is:
>>>>>>>>
>>>>>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the
>>>>>>>> paper
>>>>>>>> requires b> 2n, though I think in practice you can often get away
>>>>>>>> with
>>>>>>>> less)
>>>>>>>>
>>>>>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>>>>>> easily achieved since groups only has a few elements)
>>>>>>>>
>>>>>>>> - For each group, from largest to smallest: Find a displacement
>>>>>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>>>>>
>>>>>>>> It requires extra storage for the displacement table. However, I
>>>>>>>> think 8
>>>>>>>> bits per element might suffice even for vtables of 512 or 1024 in
>>>>>>>> size.
>>>>>>>> Even with 16 bits it's rather negligible compared to the
>>>>>>>> minimum-128-bit
>>>>>>>> entries of the table.
>>>>>>>>
>>>>>>>> I benchmarked these hash functions:
>>>>>>>>
>>>>>>>> displace1: ((h>> r1) ^ d[h& 63])& m1
>>>>>>>> displace2: ((h>> r1) ^ d[h& m2])& m1
>>>>>>>> displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1
>>>>>>>>
>>>>>>>>
>>>>>>>> Only the third one is truly in the spirit of the algorithm, but I
>>>>>>>> think
>>>>>>>> the first two should work well too (and when h is known
>>>>>>>> compile-time,
>>>>>>>> looking up d[h& 63] isn't harder than looking up r1 or m1).
>>>>>>>>
>>>>>>>>
>>>>>>>> My computer is acting up and all my numbers today are slower than
>>>>>>>> the
>>>>>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
>>>>>>>> ago,
>>>>>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>>>>>> compiled with -DIMHASH:
>>>>>>>>
>>>>>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11
>>>>>>>> val=2400000000.000000
>>>>>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>>>>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
>>>>>>>> val=1800000000.000000
>>>>>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>>>>>>>> val=1800000000.000000
>>>>>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
>>>>>>>> val=1800000000.000000
>>>>>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
>>>>>>>> val=1800000000.000000
>>>>>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
>>>>>>>> val=1800000000.000000
>>>>>>>>
>>>>>>>>
>>>>>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>>>>>
>>>>>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The paper obviously puts more effort on minimizing table size and
>>>>>>> not a
>>>>>>> fast
>>>>>>> lookup. My hunch is that our choice should be
>>>>>>>
>>>>>>> ((h>> table.r) ^ table.d[h& m2])& m1
>>>>>>>
>>>>>>>
>>>>>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>>>>>> double
>>>>>>> the number of bins than those 2 extra bits available for displacement
>>>>>>> options).
>>>>>>>
>>>>>>> Then keep incrementing the size of d and the number of table slots
>>>>>>> (in
>>>>>>> such
>>>>>>> an order that the total vtable size is minimized) until success. In
>>>>>>> practice
>>>>>>> this should almost always just increase the size of d, and keep the
>>>>>>> table
>>>>>>> size at the lowest 2**k that fits the slots (even for 64 methods or
>>>>>>> 128
>>>>>>> methods :-))
>>>>>>>
>>>>>>> Essentially we avoid the shift in the argument to d[] by making d
>>>>>>> larger.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>>>>>
>>>>>
>>>>>
>>>>> Well, table->d[const& const] compiles down to the same kind of code as
>>>>> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>>>>>
>>>>>
>>>>>> hopefully its size wouldn't be a big issue either. What kinds of
>>>>>> densities were you achieving?
>>>
>>>
>>> OK, simulation results just in (for the displace2 hash), and they
>>> exceeded my expectations.
>>>
>>> I always fill the table with n=2^k keys, and fix b = n (b means |d|).
>>> Then the failure rates are (top two are 100,000 simulations, the rest
>>> are 1000 simulations):
>>>
>>> n= 8 b= 8 failure-rate=0.0019 try-mean=4.40 try-max=65
>>> n= 16 b= 16 failure-rate=0.0008 try-mean=5.02 try-max=65
>>> n= 32 b= 32 failure-rate=0.0000 try-mean=5.67 try-max=25
>>> n= 64 b= 64 failure-rate=0.0000 try-mean=6.60 try-max=29
>>> n= 128 b= 128 failure-rate=0.0000 try-mean=7.64 try-max=22
>>> n= 256 b= 256 failure-rate=0.0000 try-mean=8.66 try-max=37
>>> n= 512 b= 512 failure-rate=0.0000 try-mean=9.57 try-max=26
>>> n=1024 b= 1024 failure-rate=0.0000 try-mean=10.66 try-max=34
>>>
>>> Try-mean and try-max is how many r's needed to be tried before success,
>>> so it gives an indication how much is left before failure.
>>>
>>> For the ~1/1000 chance of failure for n=8 and n=16, we would proceed to
>>> let b=2*n (100,000 simulations):
>>>
>>> n= 8 b= 16 failure-rate=0.0001 try-mean=2.43 try-max=65
>>> n= 16 b= 32 failure-rate=0.0000 try-mean=3.40 try-max=65
>>>
>>> NOTE: The 512...2048 results were with 16 bits displacements, with 8 bit
>>> displacements they mostly failed. So we either need to make each element
>>> of d 16 bits, or, e.g., store 512 entries in a 1024-slot table (which
>>> succeeded most of the time with 8 bit displacements). I'm +1 on 16 bits
>>> displacements.
>>>
>>> The algorithm is rather fast and concise:
>>>
>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>
>>>>> The algorithm is designed for 100% density in the table itself. (We
>>>>> can lift
>>>>> that to compensate for a small space of possible hash functions I
>>>>> guess.)
>>>>>
>>>>> I haven't done proper simulations yet, but I just tried |vtable|=128,
>>>>> |d|=128 from the command line and I had 15 successes or so before the
>>>>> first
>>>>> failure. That's with a 100% density in the vtable itself! (And when it
>>>>> fails, you increase |d| to get your success).
>>>>>
>>>>> The caveat is the space spent on d (it's small in comparison, but
>>>>> that's why
>>>>> this isn't too good to be true).
>>>>>
>>>>> A disadvantage might be that we may no longer have the opportunity to
>>>>> not
>>>>> make the table size a power of two (i.e. replace the mask with "if
>>>>> (likely(slot< n))"). I think for that to work one would need to
>>>>> replace the
>>>>> xor group with addition on Z_d.
>>>>>
>>>>>
>>>>>> Going back to the idea of linear probing on a cache miss, this has the
>>>>>> advantage that one can write a brain-dead provider that sets m=0 and
>>>>>> simply lists the methods instead of requiring a table optimizer. (Most
>>>>>> tools, of course, would do the table optimization.) It also lets you
>>>>>> get away with a "kind-of good" hash rather than requiring you search
>>>>>> until you find a (larger?) perfect one.
>>>>>
>>>>>
>>>>>
>>>>> Well, given that we can have 100% density, and generating the table is
>>>>> lightning fast, and the C code to generate the table is likely a 300
>>>>> line
>>>>> utility... I'm not convinced.
>>>>
>>>>
>>>> It goes from an extraordinary simple spec (table is, at minimum, a
>>>> func[2^k] with a couple of extra zero fields, whose struct can be
>>>> statically defined in the source by hand) to a, well, not complicated
>>>> in the absolute sense, but much more so than the definition above. It
>>>> also is variable-size which makes allocating it globally/on a stack a
>>>> pain (I suppose one can choose an upper bound for |d| and |vtable|).
>>>>
>>>> I am a bit playing devil's advocate here, it's probably just a (minor)
>>>> con, but worth noting at least.
>>>
>>>
>>> If you were willing to go the interning route, so that you didn't need
>>> to fill the table with md5 hashes anyway, I'd say you'd have a stronger
>>> point :-)
>>>
>>> Given the results above, static allocation can at least be solved in a
>>> way that is probably user-friendly enough:
>>>
>>> PyHashVTable_16_16 mytable;
>>>
>>> ...init () {
>>> mytable.functions = { ... };
>>> if (PyHashVTable_Ready((PyHashVTable*)mytable, 16, 16) == -1) return -1;
>>> }
>>>
>>> Now, with chance ~1/1000, you're going to get an exception saying
>>> "Please try PyHashVTable_16_32". (And since that's deterministic given
>>> the function definitions you always catch it at once.)
>>
>>
>> PS. PyHashVTable_Ready would do the md5's and reorder the functions etc.
>> as well.
>
>
>
> There's still the indirection through SEP 200 (extensibletype slots). We can
> get rid of that very easily by just making that table and the hash-vtable
> one and the same. (It could still either have interned string keys or ID
> keys depending on the least significant bit.)

Or we can even forgo the interning for this table, and give up on
partitioning the space numerically and just use the dns-style
prefixing, e.g. "org.cython.X" belongs to us.

There is value in the double indirection if this (or any of the other)
lookup tables are meant to be modified over time.

> To wrap up, I think this has grown in complexity beyond the "simple SEP
> spec". It's at the point where you don't really want to have several
> libraries implementing the same simple spec, but instead use the same
> implementation.
>
> But I think the advantages are simply too good to give up on.
>
> So I think a viable route forward is to forget the CEP/SEP/pre-PEP-approach
> for now (which only works for semi-complicated ideas with simple
> implementations) and instead simply work more directly on a library. It
> would need to have a couple of different use modes:

I prefer an enhancement proposal with a spec over a library, even if
only a single library gets used in practice. I still think it's simple
enough. Basically, we have the "lookup spec" and then a CEP for
applying this to fast callable (agreeing on signatures, and what to do
with extern types) and extensible type slots.

> ?- A Python perfect-hasher for use when generating code, with only the a
> string interner based on CPython dicts and extensibletype metaclass as
> runtime dependencies (for use in Cython). This would only add some hundred
> source file lines...
>
> ?- A C implementation of the perfect hashing exposed through a
> PyPerfectHashTable_Ready(), for use in libraries written in C like
> NumPy/SciPy). This would need to bundle the md5 algorithm and a C
> implementation of the perfect hashing.
>
> And on the distribution axis:
>
> ?- Small C header-style implementation of a string interner and the
> extensibletype metaclass, rendezvousing through sys.modules
>
> ?- As part of the rendezvous, one would always try to __import__ the *real*
> run-time library. So if it is available in sys.path it overrides anything
> bundled with other libraries. That would provide a way forward for GIL-less
> string interning, a Python-side library for working with these tables and
> inspecting them, etc.

Hmm, that's an interesting idea. I think we don't actually need
interning, in which case the "full" library is only needed for
introspection.

> Time to stop talking and start coding...
>
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From d.s.seljebotn at astro.uio.no  Sat Jun  9 07:45:55 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 09 Jun 2012 07:45:55 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
Message-ID: <4FD2E313.6040208@astro.uio.no>

On 06/09/2012 03:21 AM, Robert Bradshaw wrote:
> On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/07/2012 12:35 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 06/07/2012 12:20 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 06/07/2012 12:26 AM, Robert Bradshaw wrote:
>>>>>
>>>>> On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
>>>>> <d.s.seljebotn at astro.uio.no>  wrote:
>>>>>>
>>>>>> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>>>>>> <d.s.seljebotn at astro.uio.no>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I just found http://cmph.sourceforge.net/ which looks quite
>>>>>>>>>> interesting. Though the resulting hash functions are supposedly
>>>>>>>>>> cheap,
>>>>>>>>>> I have the feeling that branching is considered cheap in this
>>>>>>>>>> context.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Actually, this lead was *very* promising. I believe the very first
>>>>>>>>> reference I actually read through and didn't eliminate after the
>>>>>>>>> abstract totally swept away our home-grown solutions!
>>>>>>>>>
>>>>>>>>> "Hash&  Displace" by Pagh (1999) is actually very simple, easy to
>>>>>>>>>
>>>>>>>>> understand, and fast both for generation and (the branch-free)
>>>>>>>>> lookup:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The idea is:
>>>>>>>>>
>>>>>>>>> - Find a hash `g(x)` to partition the keys into `b` groups (the
>>>>>>>>> paper
>>>>>>>>> requires b>  2n, though I think in practice you can often get away
>>>>>>>>> with
>>>>>>>>> less)
>>>>>>>>>
>>>>>>>>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>>>>>>>>> easily achieved since groups only has a few elements)
>>>>>>>>>
>>>>>>>>> - For each group, from largest to smallest: Find a displacement
>>>>>>>>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>>>>>>>>
>>>>>>>>> It requires extra storage for the displacement table. However, I
>>>>>>>>> think 8
>>>>>>>>> bits per element might suffice even for vtables of 512 or 1024 in
>>>>>>>>> size.
>>>>>>>>> Even with 16 bits it's rather negligible compared to the
>>>>>>>>> minimum-128-bit
>>>>>>>>> entries of the table.
>>>>>>>>>
>>>>>>>>> I benchmarked these hash functions:
>>>>>>>>>
>>>>>>>>> displace1: ((h>>  r1) ^ d[h&  63])&  m1
>>>>>>>>> displace2: ((h>>  r1) ^ d[h&  m2])&  m1
>>>>>>>>> displace3: ((h>>  r1) ^ d[(h>>  r2)&  m2])&  m1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Only the third one is truly in the spirit of the algorithm, but I
>>>>>>>>> think
>>>>>>>>> the first two should work well too (and when h is known
>>>>>>>>> compile-time,
>>>>>>>>> looking up d[h&  63] isn't harder than looking up r1 or m1).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My computer is acting up and all my numbers today are slower than
>>>>>>>>> the
>>>>>>>>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
>>>>>>>>> ago,
>>>>>>>>> and yes, I've pinned the CPU speed). But here's today's numbers,
>>>>>>>>> compiled with -DIMHASH:
>>>>>>>>>
>>>>>>>>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11
>>>>>>>>> val=2400000000.000000
>>>>>>>>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=1800000000.000000
>>>>>>>>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
>>>>>>>>> val=1800000000.000000
>>>>>>>>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>>>>>>>>> val=1800000000.000000
>>>>>>>>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
>>>>>>>>> val=1800000000.000000
>>>>>>>>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
>>>>>>>>> val=1800000000.000000
>>>>>>>>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
>>>>>>>>> val=1800000000.000000
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I did a dirty prototype of the table-finder as well and it works:
>>>>>>>>>
>>>>>>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The paper obviously puts more effort on minimizing table size and
>>>>>>>> not a
>>>>>>>> fast
>>>>>>>> lookup. My hunch is that our choice should be
>>>>>>>>
>>>>>>>> ((h>>  table.r) ^ table.d[h&  m2])&  m1
>>>>>>>>
>>>>>>>>
>>>>>>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>>>>>>> double
>>>>>>>> the number of bins than those 2 extra bits available for displacement
>>>>>>>> options).
>>>>>>>>
>>>>>>>> Then keep incrementing the size of d and the number of table slots
>>>>>>>> (in
>>>>>>>> such
>>>>>>>> an order that the total vtable size is minimized) until success. In
>>>>>>>> practice
>>>>>>>> this should almost always just increase the size of d, and keep the
>>>>>>>> table
>>>>>>>> size at the lowest 2**k that fits the slots (even for 64 methods or
>>>>>>>> 128
>>>>>>>> methods :-))
>>>>>>>>
>>>>>>>> Essentially we avoid the shift in the argument to d[] by making d
>>>>>>>> larger.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, table->d[const&  const] compiles down to the same kind of code as
>>>>>> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>>>>>>
>>>>>>
>>>>>>> hopefully its size wouldn't be a big issue either. What kinds of
>>>>>>> densities were you achieving?
>>>>
>>>>
>>>> OK, simulation results just in (for the displace2 hash), and they
>>>> exceeded my expectations.
>>>>
>>>> I always fill the table with n=2^k keys, and fix b = n (b means |d|).
>>>> Then the failure rates are (top two are 100,000 simulations, the rest
>>>> are 1000 simulations):
>>>>
>>>> n= 8 b= 8 failure-rate=0.0019 try-mean=4.40 try-max=65
>>>> n= 16 b= 16 failure-rate=0.0008 try-mean=5.02 try-max=65
>>>> n= 32 b= 32 failure-rate=0.0000 try-mean=5.67 try-max=25
>>>> n= 64 b= 64 failure-rate=0.0000 try-mean=6.60 try-max=29
>>>> n= 128 b= 128 failure-rate=0.0000 try-mean=7.64 try-max=22
>>>> n= 256 b= 256 failure-rate=0.0000 try-mean=8.66 try-max=37
>>>> n= 512 b= 512 failure-rate=0.0000 try-mean=9.57 try-max=26
>>>> n=1024 b= 1024 failure-rate=0.0000 try-mean=10.66 try-max=34
>>>>
>>>> Try-mean and try-max is how many r's needed to be tried before success,
>>>> so it gives an indication how much is left before failure.
>>>>
>>>> For the ~1/1000 chance of failure for n=8 and n=16, we would proceed to
>>>> let b=2*n (100,000 simulations):
>>>>
>>>> n= 8 b= 16 failure-rate=0.0001 try-mean=2.43 try-max=65
>>>> n= 16 b= 32 failure-rate=0.0000 try-mean=3.40 try-max=65
>>>>
>>>> NOTE: The 512...2048 results were with 16 bits displacements, with 8 bit
>>>> displacements they mostly failed. So we either need to make each element
>>>> of d 16 bits, or, e.g., store 512 entries in a 1024-slot table (which
>>>> succeeded most of the time with 8 bit displacements). I'm +1 on 16 bits
>>>> displacements.
>>>>
>>>> The algorithm is rather fast and concise:
>>>>
>>>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>>
>>>>>> The algorithm is designed for 100% density in the table itself. (We
>>>>>> can lift
>>>>>> that to compensate for a small space of possible hash functions I
>>>>>> guess.)
>>>>>>
>>>>>> I haven't done proper simulations yet, but I just tried |vtable|=128,
>>>>>> |d|=128 from the command line and I had 15 successes or so before the
>>>>>> first
>>>>>> failure. That's with a 100% density in the vtable itself! (And when it
>>>>>> fails, you increase |d| to get your success).
>>>>>>
>>>>>> The caveat is the space spent on d (it's small in comparison, but
>>>>>> that's why
>>>>>> this isn't too good to be true).
>>>>>>
>>>>>> A disadvantage might be that we may no longer have the opportunity to
>>>>>> not
>>>>>> make the table size a power of two (i.e. replace the mask with "if
>>>>>> (likely(slot<  n))"). I think for that to work one would need to
>>>>>> replace the
>>>>>> xor group with addition on Z_d.
>>>>>>
>>>>>>
>>>>>>> Going back to the idea of linear probing on a cache miss, this has the
>>>>>>> advantage that one can write a brain-dead provider that sets m=0 and
>>>>>>> simply lists the methods instead of requiring a table optimizer. (Most
>>>>>>> tools, of course, would do the table optimization.) It also lets you
>>>>>>> get away with a "kind-of good" hash rather than requiring you search
>>>>>>> until you find a (larger?) perfect one.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, given that we can have 100% density, and generating the table is
>>>>>> lightning fast, and the C code to generate the table is likely a 300
>>>>>> line
>>>>>> utility... I'm not convinced.
>>>>>
>>>>>
>>>>> It goes from an extraordinary simple spec (table is, at minimum, a
>>>>> func[2^k] with a couple of extra zero fields, whose struct can be
>>>>> statically defined in the source by hand) to a, well, not complicated
>>>>> in the absolute sense, but much more so than the definition above. It
>>>>> also is variable-size which makes allocating it globally/on a stack a
>>>>> pain (I suppose one can choose an upper bound for |d| and |vtable|).
>>>>>
>>>>> I am a bit playing devil's advocate here, it's probably just a (minor)
>>>>> con, but worth noting at least.
>>>>
>>>>
>>>> If you were willing to go the interning route, so that you didn't need
>>>> to fill the table with md5 hashes anyway, I'd say you'd have a stronger
>>>> point :-)
>>>>
>>>> Given the results above, static allocation can at least be solved in a
>>>> way that is probably user-friendly enough:
>>>>
>>>> PyHashVTable_16_16 mytable;
>>>>
>>>> ...init () {
>>>> mytable.functions = { ... };
>>>> if (PyHashVTable_Ready((PyHashVTable*)mytable, 16, 16) == -1) return -1;
>>>> }
>>>>
>>>> Now, with chance ~1/1000, you're going to get an exception saying
>>>> "Please try PyHashVTable_16_32". (And since that's deterministic given
>>>> the function definitions you always catch it at once.)
>>>
>>>
>>> PS. PyHashVTable_Ready would do the md5's and reorder the functions etc.
>>> as well.
>>
>>
>>
>> There's still the indirection through SEP 200 (extensibletype slots). We can
>> get rid of that very easily by just making that table and the hash-vtable
>> one and the same. (It could still either have interned string keys or ID
>> keys depending on the least significant bit.)
>
> Or we can even forgo the interning for this table, and give up on
> partitioning the space numerically and just use the dns-style
> prefixing, e.g. "org.cython.X" belongs to us.

Huh? Isn't that when you *need* interning? Do you plan on key-encoding 
those kind of strings into 64 bits?

(I think it would usually be "method:foo:ii->d" (or my current 
preference is "method:foo:i4i8->f8"))

Partitioning the space numerically you'd just hash the number; "SEP 260: 
We use id 0x70040001, which has lower-64-md5 0xfa454a...ULL".

> There is value in the double indirection if this (or any of the other)
> lookup tables are meant to be modified over time.

This isn't impossible with a hash table either. You just need to 
reallocate a little more often than what would be the case with a 
regular hash table, but not dramatically so (you need to rehash whenever 
the element to insert hashes into a "large" bin, which are rather few).

I want the table to have a pointer to it, so that you can atomically 
swap it out.

>> To wrap up, I think this has grown in complexity beyond the "simple SEP
>> spec". It's at the point where you don't really want to have several
>> libraries implementing the same simple spec, but instead use the same
>> implementation.
>>
>> But I think the advantages are simply too good to give up on.
>>
>> So I think a viable route forward is to forget the CEP/SEP/pre-PEP-approach
>> for now (which only works for semi-complicated ideas with simple
>> implementations) and instead simply work more directly on a library. It
>> would need to have a couple of different use modes:
>
> I prefer an enhancement proposal with a spec over a library, even if
> only a single library gets used in practice. I still think it's simple
> enough. Basically, we have the "lookup spec" and then a CEP for
> applying this to fast callable (agreeing on signatures, and what to do
> with extern types) and extensible type slots.

OK.

>
>>   - A Python perfect-hasher for use when generating code, with only the a
>> string interner based on CPython dicts and extensibletype metaclass as
>> runtime dependencies (for use in Cython). This would only add some hundred
>> source file lines...
>>
>>   - A C implementation of the perfect hashing exposed through a
>> PyPerfectHashTable_Ready(), for use in libraries written in C like
>> NumPy/SciPy). This would need to bundle the md5 algorithm and a C
>> implementation of the perfect hashing.
>>
>> And on the distribution axis:
>>
>>   - Small C header-style implementation of a string interner and the
>> extensibletype metaclass, rendezvousing through sys.modules
>>
>>   - As part of the rendezvous, one would always try to __import__ the *real*
>> run-time library. So if it is available in sys.path it overrides anything
>> bundled with other libraries. That would provide a way forward for GIL-less
>> string interning, a Python-side library for working with these tables and
>> inspecting them, etc.
>
> Hmm, that's an interesting idea. I think we don't actually need
> interning, in which case the "full" library is only needed for
> introspection.

You don't believe the security concern is real then? Or do you want to 
pay the cost for a 160-bit SHA1 compare everywhere?

I'd love to not do interning, but I see no way around it.

BTW, a GIL-less interning library isn't rocket science. I ran khash.h 
through a preprocessor with

KHASH_MAP_INIT_STR(str_to_entry, entry_t)

and the result is 180 lines of code for the hash table. Then pythread.h 
provides the thread lock, another 50 lines for the interning logic 
(intern_literal, intern_heap_allocated, release_interned).

It just seems a little redundant to ship such a thing in every 
Cython-generated file since we hold the GIL during module loading anyway.

Dag

From d.s.seljebotn at astro.uio.no  Sat Jun  9 08:00:50 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 09 Jun 2012 08:00:50 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD2E313.6040208@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
Message-ID: <4FD2E692.4040404@astro.uio.no>

On 06/09/2012 07:45 AM, Dag Sverre Seljebotn wrote:
> On 06/09/2012 03:21 AM, Robert Bradshaw wrote:
>> On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn
>>> There's still the indirection through SEP 200 (extensibletype slots).
>>> We can
>>> get rid of that very easily by just making that table and the
>>> hash-vtable
>>> one and the same. (It could still either have interned string keys or ID
>>> keys depending on the least significant bit.)
>>
>> Or we can even forgo the interning for this table, and give up on
>> partitioning the space numerically and just use the dns-style
>> prefixing, e.g. "org.cython.X" belongs to us.
>
> Huh? Isn't that when you *need* interning? Do you plan on key-encoding
> those kind of strings into 64 bits?
>
> (I think it would usually be "method:foo:ii->d" (or my current
> preference is "method:foo:i4i8->f8"))

Well, I guess something like "org.cython.X" would happen often as well, 
in addition. Just put it all in the same table :-)

>
> Partitioning the space numerically you'd just hash the number; "SEP 260:
> We use id 0x70040001, which has lower-64-md5 0xfa454a...ULL".

The real use-case I see for this now is in having the PyArray_DATA etc. 
access pointers simply through compile-time constants the library can 
define on both ends. It could just do

PyCustomSlots_Lookup(obj->ob_type, 0x70040001, 0xfa45323...ULL)

specifically to get a function retrieving the data-pointer. 
PyArray_SHAPE would do

PyCustomSlots_Lookup(obj->ob_type, 0x70040002, 0xbbad423...ULL)

Also, I'd want PyExtensibleType_Object to have:

{
     ...
     PyPerfectTable *tp_perfect_table;
     Py_ssize_t tp_perfect_table_obj_offset;
}

i.e. we allow for getting quickly to a table on the object in addition 
to the one on the type.

Callbacks look up the one on the object first (before potentially 
checking for __call__ in the type); method-calling might ignore the one 
on the object.

Dag

>
>> There is value in the double indirection if this (or any of the other)
>> lookup tables are meant to be modified over time.
>
> This isn't impossible with a hash table either. You just need to
> reallocate a little more often than what would be the case with a
> regular hash table, but not dramatically so (you need to rehash whenever
> the element to insert hashes into a "large" bin, which are rather few).
>
> I want the table to have a pointer to it, so that you can atomically
> swap it out.
>
>>> To wrap up, I think this has grown in complexity beyond the "simple SEP
>>> spec". It's at the point where you don't really want to have several
>>> libraries implementing the same simple spec, but instead use the same
>>> implementation.
>>>
>>> But I think the advantages are simply too good to give up on.
>>>
>>> So I think a viable route forward is to forget the
>>> CEP/SEP/pre-PEP-approach
>>> for now (which only works for semi-complicated ideas with simple
>>> implementations) and instead simply work more directly on a library. It
>>> would need to have a couple of different use modes:
>>
>> I prefer an enhancement proposal with a spec over a library, even if
>> only a single library gets used in practice. I still think it's simple
>> enough. Basically, we have the "lookup spec" and then a CEP for
>> applying this to fast callable (agreeing on signatures, and what to do
>> with extern types) and extensible type slots.
>
> OK.
>
>>
>>> - A Python perfect-hasher for use when generating code, with only the a
>>> string interner based on CPython dicts and extensibletype metaclass as
>>> runtime dependencies (for use in Cython). This would only add some
>>> hundred
>>> source file lines...
>>>
>>> - A C implementation of the perfect hashing exposed through a
>>> PyPerfectHashTable_Ready(), for use in libraries written in C like
>>> NumPy/SciPy). This would need to bundle the md5 algorithm and a C
>>> implementation of the perfect hashing.
>>>
>>> And on the distribution axis:
>>>
>>> - Small C header-style implementation of a string interner and the
>>> extensibletype metaclass, rendezvousing through sys.modules
>>>
>>> - As part of the rendezvous, one would always try to __import__ the
>>> *real*
>>> run-time library. So if it is available in sys.path it overrides
>>> anything
>>> bundled with other libraries. That would provide a way forward for
>>> GIL-less
>>> string interning, a Python-side library for working with these tables
>>> and
>>> inspecting them, etc.
>>
>> Hmm, that's an interesting idea. I think we don't actually need
>> interning, in which case the "full" library is only needed for
>> introspection.
>
> You don't believe the security concern is real then? Or do you want to
> pay the cost for a 160-bit SHA1 compare everywhere?
>
> I'd love to not do interning, but I see no way around it.
>
> BTW, a GIL-less interning library isn't rocket science. I ran khash.h
> through a preprocessor with
>
> KHASH_MAP_INIT_STR(str_to_entry, entry_t)
>
> and the result is 180 lines of code for the hash table. Then pythread.h
> provides the thread lock, another 50 lines for the interning logic
> (intern_literal, intern_heap_allocated, release_interned).
>
> It just seems a little redundant to ship such a thing in every
> Cython-generated file since we hold the GIL during module loading anyway.
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


From d.s.seljebotn at astro.uio.no  Sat Jun  9 08:02:07 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 09 Jun 2012 08:02:07 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD2E692.4040404@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no> <4FD2E692.4040404@astro.uio.no>
Message-ID: <4FD2E6DF.5000606@astro.uio.no>

On 06/09/2012 08:00 AM, Dag Sverre Seljebotn wrote:
> On 06/09/2012 07:45 AM, Dag Sverre Seljebotn wrote:
>> On 06/09/2012 03:21 AM, Robert Bradshaw wrote:
>>> On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn
>>>> There's still the indirection through SEP 200 (extensibletype slots).
>>>> We can
>>>> get rid of that very easily by just making that table and the
>>>> hash-vtable
>>>> one and the same. (It could still either have interned string keys
>>>> or ID
>>>> keys depending on the least significant bit.)
>>>
>>> Or we can even forgo the interning for this table, and give up on
>>> partitioning the space numerically and just use the dns-style
>>> prefixing, e.g. "org.cython.X" belongs to us.
>>
>> Huh? Isn't that when you *need* interning? Do you plan on key-encoding
>> those kind of strings into 64 bits?
>>
>> (I think it would usually be "method:foo:ii->d" (or my current
>> preference is "method:foo:i4i8->f8"))
>
> Well, I guess something like "org.cython.X" would happen often as well,
> in addition. Just put it all in the same table :-)
>
>>
>> Partitioning the space numerically you'd just hash the number; "SEP 260:
>> We use id 0x70040001, which has lower-64-md5 0xfa454a...ULL".
>
> The real use-case I see for this now is in having the PyArray_DATA etc.
> access pointers simply through compile-time constants the library can
> define on both ends. It could just do
>
> PyCustomSlots_Lookup(obj->ob_type, 0x70040001, 0xfa45323...ULL)
>
> specifically to get a function retrieving the data-pointer.
> PyArray_SHAPE would do
>
> PyCustomSlots_Lookup(obj->ob_type, 0x70040002, 0xbbad423...ULL)

Argh. I meant 0x70040002 | 1, of course ;-)

DS

>
> Also, I'd want PyExtensibleType_Object to have:
>
> {
> ...
> PyPerfectTable *tp_perfect_table;
> Py_ssize_t tp_perfect_table_obj_offset;
> }
>
> i.e. we allow for getting quickly to a table on the object in addition
> to the one on the type.
>
> Callbacks look up the one on the object first (before potentially
> checking for __call__ in the type); method-calling might ignore the one
> on the object.
>
> Dag
>
>>
>>> There is value in the double indirection if this (or any of the other)
>>> lookup tables are meant to be modified over time.
>>
>> This isn't impossible with a hash table either. You just need to
>> reallocate a little more often than what would be the case with a
>> regular hash table, but not dramatically so (you need to rehash whenever
>> the element to insert hashes into a "large" bin, which are rather few).
>>
>> I want the table to have a pointer to it, so that you can atomically
>> swap it out.
>>
>>>> To wrap up, I think this has grown in complexity beyond the "simple SEP
>>>> spec". It's at the point where you don't really want to have several
>>>> libraries implementing the same simple spec, but instead use the same
>>>> implementation.
>>>>
>>>> But I think the advantages are simply too good to give up on.
>>>>
>>>> So I think a viable route forward is to forget the
>>>> CEP/SEP/pre-PEP-approach
>>>> for now (which only works for semi-complicated ideas with simple
>>>> implementations) and instead simply work more directly on a library. It
>>>> would need to have a couple of different use modes:
>>>
>>> I prefer an enhancement proposal with a spec over a library, even if
>>> only a single library gets used in practice. I still think it's simple
>>> enough. Basically, we have the "lookup spec" and then a CEP for
>>> applying this to fast callable (agreeing on signatures, and what to do
>>> with extern types) and extensible type slots.
>>
>> OK.
>>
>>>
>>>> - A Python perfect-hasher for use when generating code, with only the a
>>>> string interner based on CPython dicts and extensibletype metaclass as
>>>> runtime dependencies (for use in Cython). This would only add some
>>>> hundred
>>>> source file lines...
>>>>
>>>> - A C implementation of the perfect hashing exposed through a
>>>> PyPerfectHashTable_Ready(), for use in libraries written in C like
>>>> NumPy/SciPy). This would need to bundle the md5 algorithm and a C
>>>> implementation of the perfect hashing.
>>>>
>>>> And on the distribution axis:
>>>>
>>>> - Small C header-style implementation of a string interner and the
>>>> extensibletype metaclass, rendezvousing through sys.modules
>>>>
>>>> - As part of the rendezvous, one would always try to __import__ the
>>>> *real*
>>>> run-time library. So if it is available in sys.path it overrides
>>>> anything
>>>> bundled with other libraries. That would provide a way forward for
>>>> GIL-less
>>>> string interning, a Python-side library for working with these tables
>>>> and
>>>> inspecting them, etc.
>>>
>>> Hmm, that's an interesting idea. I think we don't actually need
>>> interning, in which case the "full" library is only needed for
>>> introspection.
>>
>> You don't believe the security concern is real then? Or do you want to
>> pay the cost for a 160-bit SHA1 compare everywhere?
>>
>> I'd love to not do interning, but I see no way around it.
>>
>> BTW, a GIL-less interning library isn't rocket science. I ran khash.h
>> through a preprocessor with
>>
>> KHASH_MAP_INIT_STR(str_to_entry, entry_t)
>>
>> and the result is 180 lines of code for the hash table. Then pythread.h
>> provides the thread lock, another 50 lines for the interning logic
>> (intern_literal, intern_heap_allocated, release_interned).
>>
>> It just seems a little redundant to ship such a thing in every
>> Cython-generated file since we hold the GIL during module loading anyway.
>>
>> Dag
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>


From robertwb at gmail.com  Sun Jun 10 09:00:44 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Sun, 10 Jun 2012 00:00:44 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD2E313.6040208@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
Message-ID: <CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>

On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/09/2012 03:21 AM, Robert Bradshaw wrote:
>>
>> On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn
>>> There's still the indirection through SEP 200 (extensibletype slots). We
>>> can
>>> get rid of that very easily by just making that table and the hash-vtable
>>> one and the same. (It could still either have interned string keys or ID
>>> keys depending on the least significant bit.)
>>
>>
>> Or we can even forgo the interning for this table, and give up on
>> partitioning the space numerically and just use the dns-style
>> prefixing, e.g. "org.cython.X" belongs to us.
>
>
> Huh? Isn't that when you *need* interning? Do you plan on key-encoding those
> kind of strings into 64 bits?

No, use 64-bits of a a cryptographically-secure hash.

> (I think it would usually be "method:foo:ii->d" (or my current preference is
> "method:foo:i4i8->f8"))

Yeah, I was assuming methods wouldn't be specific to Cython. (Putting
sizes in the format makes a lot of sense for persistent storage, but I
think it's safe to assume that a"long in the provider == a long in the
consumer, and this would mean the hashes would have to be computed
after (some) C compilation).

> Partitioning the space numerically you'd just hash the number; "SEP 260: We
> use id 0x70040001, which has lower-64-md5 0xfa454a...ULL".

But why bother with the id?

>> There is value in the double indirection if this (or any of the other)
>> lookup tables are meant to be modified over time.
>
>
> This isn't impossible with a hash table either. You just need to reallocate
> a little more often than what would be the case with a regular hash table,
> but not dramatically so (you need to rehash whenever the element to insert
> hashes into a "large" bin, which are rather few).
>
> I want the table to have a pointer to it, so that you can atomically swap it
> out.

I think that's worth a level of indirection.

>>> To wrap up, I think this has grown in complexity beyond the "simple SEP
>>> spec". It's at the point where you don't really want to have several
>>> libraries implementing the same simple spec, but instead use the same
>>> implementation.
>>>
>>> But I think the advantages are simply too good to give up on.
>>>
>>> So I think a viable route forward is to forget the
>>> CEP/SEP/pre-PEP-approach
>>> for now (which only works for semi-complicated ideas with simple
>>> implementations) and instead simply work more directly on a library. It
>>> would need to have a couple of different use modes:
>>
>>
>> I prefer an enhancement proposal with a spec over a library, even if
>> only a single library gets used in practice. I still think it's simple
>> enough. Basically, we have the "lookup spec" and then a CEP for
>> applying this to fast callable (agreeing on signatures, and what to do
>> with extern types) and extensible type slots.
>
>
> OK.
>
>
>>
>>> ?- A Python perfect-hasher for use when generating code, with only the a
>>> string interner based on CPython dicts and extensibletype metaclass as
>>> runtime dependencies (for use in Cython). This would only add some
>>> hundred
>>> source file lines...
>>>
>>> ?- A C implementation of the perfect hashing exposed through a
>>> PyPerfectHashTable_Ready(), for use in libraries written in C like
>>> NumPy/SciPy). This would need to bundle the md5 algorithm and a C
>>> implementation of the perfect hashing.
>>>
>>> And on the distribution axis:
>>>
>>> ?- Small C header-style implementation of a string interner and the
>>> extensibletype metaclass, rendezvousing through sys.modules
>>>
>>> ?- As part of the rendezvous, one would always try to __import__ the
>>> *real*
>>> run-time library. So if it is available in sys.path it overrides anything
>>> bundled with other libraries. That would provide a way forward for
>>> GIL-less
>>> string interning, a Python-side library for working with these tables and
>>> inspecting them, etc.
>>
>>
>> Hmm, that's an interesting idea. I think we don't actually need
>> interning, in which case the "full" library is only needed for
>> introspection.
>
>
> You don't believe the security concern is real then? Or do you want to pay
> the cost for a 160-bit SHA1 compare everywhere?
>
> I'd love to not do interning, but I see no way around it.

No, I want to use the lower 64 bits by default, but always have the
top 96 bits around to allow using this mechanism in "secure" mode at a
slight penalty. md5 is out because there are known collisions. (Yes,
sha-1 may succumb sooner rather than later, theoretical weaknesses
have been shown, so we could look to using something else (hopefully
still shipped with Python).

> BTW, a GIL-less interning library isn't rocket science. I ran khash.h
> through a preprocessor with
>
> KHASH_MAP_INIT_STR(str_to_entry, entry_t)
>
> and the result is 180 lines of code for the hash table. Then pythread.h
> provides the thread lock, another 50 lines for the interning logic
> (intern_literal, intern_heap_allocated, release_interned).
>
> It just seems a little redundant to ship such a thing in every
> Cython-generated file since we hold the GIL during module loading anyway.

It's the rendezvous on the global state that's more messy then the
locking (though we do already require that for the metaclass approach
of detecting types with extended slots).

- Robert

From d.s.seljebotn at astro.uio.no  Sun Jun 10 09:14:43 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 10 Jun 2012 09:14:43 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
Message-ID: <6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>


Robert Bradshaw <robertwb at gmail.com> wrote:

>On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
><d.s.seljebotn at astro.uio.no> wrote:
>> On 06/09/2012 03:21 AM, Robert Bradshaw wrote:
>>>
>>> On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn
>>>> There's still the indirection through SEP 200 (extensibletype
>slots). We
>>>> can
>>>> get rid of that very easily by just making that table and the
>hash-vtable
>>>> one and the same. (It could still either have interned string keys
>or ID
>>>> keys depending on the least significant bit.)
>>>
>>>
>>> Or we can even forgo the interning for this table, and give up on
>>> partitioning the space numerically and just use the dns-style
>>> prefixing, e.g. "org.cython.X" belongs to us.
>>
>>
>> Huh? Isn't that when you *need* interning? Do you plan on
>key-encoding those
>> kind of strings into 64 bits?
>
>No, use 64-bits of a a cryptographically-secure hash.
>
>> (I think it would usually be "method:foo:ii->d" (or my current
>preference is
>> "method:foo:i4i8->f8"))
>
>Yeah, I was assuming methods wouldn't be specific to Cython. (Putting
>sizes in the format makes a lot of sense for persistent storage, but I
>think it's safe to assume that a"long in the provider == a long in the
>consumer, and this would mean the hashes would have to be computed
>after (some) C compilation).
>
>> Partitioning the space numerically you'd just hash the number; "SEP
>260: We
>> use id 0x70040001, which has lower-64-md5 0xfa454a...ULL".
>
>But why bother with the id?
>
>>> There is value in the double indirection if this (or any of the
>other)
>>> lookup tables are meant to be modified over time.
>>
>>
>> This isn't impossible with a hash table either. You just need to
>reallocate
>> a little more often than what would be the case with a regular hash
>table,
>> but not dramatically so (you need to rehash whenever the element to
>insert
>> hashes into a "large" bin, which are rather few).
>>
>> I want the table to have a pointer to it, so that you can atomically
>swap it
>> out.
>
>I think that's worth a level of indirection.
>
>>>> To wrap up, I think this has grown in complexity beyond the "simple
>SEP
>>>> spec". It's at the point where you don't really want to have
>several
>>>> libraries implementing the same simple spec, but instead use the
>same
>>>> implementation.
>>>>
>>>> But I think the advantages are simply too good to give up on.
>>>>
>>>> So I think a viable route forward is to forget the
>>>> CEP/SEP/pre-PEP-approach
>>>> for now (which only works for semi-complicated ideas with simple
>>>> implementations) and instead simply work more directly on a
>library. It
>>>> would need to have a couple of different use modes:
>>>
>>>
>>> I prefer an enhancement proposal with a spec over a library, even if
>>> only a single library gets used in practice. I still think it's
>simple
>>> enough. Basically, we have the "lookup spec" and then a CEP for
>>> applying this to fast callable (agreeing on signatures, and what to
>do
>>> with extern types) and extensible type slots.
>>
>>
>> OK.
>>
>>
>>>
>>>> ?- A Python perfect-hasher for use when generating code, with only
>the a
>>>> string interner based on CPython dicts and extensibletype metaclass
>as
>>>> runtime dependencies (for use in Cython). This would only add some
>>>> hundred
>>>> source file lines...
>>>>
>>>> ?- A C implementation of the perfect hashing exposed through a
>>>> PyPerfectHashTable_Ready(), for use in libraries written in C like
>>>> NumPy/SciPy). This would need to bundle the md5 algorithm and a C
>>>> implementation of the perfect hashing.
>>>>
>>>> And on the distribution axis:
>>>>
>>>> ?- Small C header-style implementation of a string interner and the
>>>> extensibletype metaclass, rendezvousing through sys.modules
>>>>
>>>> ?- As part of the rendezvous, one would always try to __import__
>the
>>>> *real*
>>>> run-time library. So if it is available in sys.path it overrides
>anything
>>>> bundled with other libraries. That would provide a way forward for
>>>> GIL-less
>>>> string interning, a Python-side library for working with these
>tables and
>>>> inspecting them, etc.
>>>
>>>
>>> Hmm, that's an interesting idea. I think we don't actually need
>>> interning, in which case the "full" library is only needed for
>>> introspection.
>>
>>
>> You don't believe the security concern is real then? Or do you want
>to pay
>> the cost for a 160-bit SHA1 compare everywhere?
>>
>> I'd love to not do interning, but I see no way around it.
>
>No, I want to use the lower 64 bits by default, but always have the
>top 96 bits around to allow using this mechanism in "secure" mode at a
>slight penalty. md5 is out because there are known collisions. (Yes,
>sha-1 may succumb sooner rather than later, theoretical weaknesses
>have been shown, so we could look to using something else (hopefully
>still shipped with Python).

But very few users are going to know about this. What's the odds that the user who decide to trigger JIT-compilation with function signatures that varies based on the input will know about the option and turn it on and also recompile all his/her C extension modules?

In practice, such an option would always stay at its default value. If we leave it to secure by default and start teaching it to users from the start...but that's a big price to pay.

And if you *do* want to run in secure mode, it will be a lot slower than interning.

Dag

>
>> BTW, a GIL-less interning library isn't rocket science. I ran khash.h
>> through a preprocessor with
>>
>> KHASH_MAP_INIT_STR(str_to_entry, entry_t)
>>
>> and the result is 180 lines of code for the hash table. Then
>pythread.h
>> provides the thread lock, another 50 lines for the interning logic
>> (intern_literal, intern_heap_allocated, release_interned).
>>
>> It just seems a little redundant to ship such a thing in every
>> Cython-generated file since we hold the GIL during module loading
>anyway.
>
>It's the rendezvous on the global state that's more messy then the
>locking (though we do already require that for the metaclass approach
>of detecting types with extended slots).
>
>- Robert
>_______________________________________________
>cython-devel mailing list
>cython-devel at python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

From robertwb at gmail.com  Sun Jun 10 09:34:21 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Sun, 10 Jun 2012 00:34:21 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
Message-ID: <CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>

On Sun, Jun 10, 2012 at 12:14 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
>
>
> Robert Bradshaw <robertwb at gmail.com> wrote:
>
>>On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
>><d.s.seljebotn at astro.uio.no> wrote:
>>> On 06/09/2012 03:21 AM, Robert Bradshaw wrote:
>>>>
>>>> On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn
>>>>> There's still the indirection through SEP 200 (extensibletype
>>slots). We
>>>>> can
>>>>> get rid of that very easily by just making that table and the
>>hash-vtable
>>>>> one and the same. (It could still either have interned string keys
>>or ID
>>>>> keys depending on the least significant bit.)
>>>>
>>>>
>>>> Or we can even forgo the interning for this table, and give up on
>>>> partitioning the space numerically and just use the dns-style
>>>> prefixing, e.g. "org.cython.X" belongs to us.
>>>
>>>
>>> Huh? Isn't that when you *need* interning? Do you plan on
>>key-encoding those
>>> kind of strings into 64 bits?
>>
>>No, use 64-bits of a a cryptographically-secure hash.
>>
>>> (I think it would usually be "method:foo:ii->d" (or my current
>>preference is
>>> "method:foo:i4i8->f8"))
>>
>>Yeah, I was assuming methods wouldn't be specific to Cython. (Putting
>>sizes in the format makes a lot of sense for persistent storage, but I
>>think it's safe to assume that a"long in the provider == a long in the
>>consumer, and this would mean the hashes would have to be computed
>>after (some) C compilation).
>>
>>> Partitioning the space numerically you'd just hash the number; "SEP
>>260: We
>>> use id 0x70040001, which has lower-64-md5 0xfa454a...ULL".
>>
>>But why bother with the id?
>>
>>>> There is value in the double indirection if this (or any of the
>>other)
>>>> lookup tables are meant to be modified over time.
>>>
>>>
>>> This isn't impossible with a hash table either. You just need to
>>reallocate
>>> a little more often than what would be the case with a regular hash
>>table,
>>> but not dramatically so (you need to rehash whenever the element to
>>insert
>>> hashes into a "large" bin, which are rather few).
>>>
>>> I want the table to have a pointer to it, so that you can atomically
>>swap it
>>> out.
>>
>>I think that's worth a level of indirection.
>>
>>>>> To wrap up, I think this has grown in complexity beyond the "simple
>>SEP
>>>>> spec". It's at the point where you don't really want to have
>>several
>>>>> libraries implementing the same simple spec, but instead use the
>>same
>>>>> implementation.
>>>>>
>>>>> But I think the advantages are simply too good to give up on.
>>>>>
>>>>> So I think a viable route forward is to forget the
>>>>> CEP/SEP/pre-PEP-approach
>>>>> for now (which only works for semi-complicated ideas with simple
>>>>> implementations) and instead simply work more directly on a
>>library. It
>>>>> would need to have a couple of different use modes:
>>>>
>>>>
>>>> I prefer an enhancement proposal with a spec over a library, even if
>>>> only a single library gets used in practice. I still think it's
>>simple
>>>> enough. Basically, we have the "lookup spec" and then a CEP for
>>>> applying this to fast callable (agreeing on signatures, and what to
>>do
>>>> with extern types) and extensible type slots.
>>>
>>>
>>> OK.
>>>
>>>
>>>>
>>>>> ?- A Python perfect-hasher for use when generating code, with only
>>the a
>>>>> string interner based on CPython dicts and extensibletype metaclass
>>as
>>>>> runtime dependencies (for use in Cython). This would only add some
>>>>> hundred
>>>>> source file lines...
>>>>>
>>>>> ?- A C implementation of the perfect hashing exposed through a
>>>>> PyPerfectHashTable_Ready(), for use in libraries written in C like
>>>>> NumPy/SciPy). This would need to bundle the md5 algorithm and a C
>>>>> implementation of the perfect hashing.
>>>>>
>>>>> And on the distribution axis:
>>>>>
>>>>> ?- Small C header-style implementation of a string interner and the
>>>>> extensibletype metaclass, rendezvousing through sys.modules
>>>>>
>>>>> ?- As part of the rendezvous, one would always try to __import__
>>the
>>>>> *real*
>>>>> run-time library. So if it is available in sys.path it overrides
>>anything
>>>>> bundled with other libraries. That would provide a way forward for
>>>>> GIL-less
>>>>> string interning, a Python-side library for working with these
>>tables and
>>>>> inspecting them, etc.
>>>>
>>>>
>>>> Hmm, that's an interesting idea. I think we don't actually need
>>>> interning, in which case the "full" library is only needed for
>>>> introspection.
>>>
>>>
>>> You don't believe the security concern is real then? Or do you want
>>to pay
>>> the cost for a 160-bit SHA1 compare everywhere?
>>>
>>> I'd love to not do interning, but I see no way around it.
>>
>>No, I want to use the lower 64 bits by default, but always have the
>>top 96 bits around to allow using this mechanism in "secure" mode at a
>>slight penalty. md5 is out because there are known collisions. (Yes,
>>sha-1 may succumb sooner rather than later, theoretical weaknesses
>>have been shown, so we could look to using something else (hopefully
>>still shipped with Python).
>
> But very few users are going to know about this. What's the odds that the user who decide to trigger JIT-compilation with function signatures that varies based on the input will know about the option and turn it on and also recompile all his/her C extension modules?
>
> In practice, such an option would always stay at its default value. If we leave it to secure by default and start teaching it to users from the start...but that's a big price to pay.

Yes, it's not ideal from this perspective.

> And if you *do* want to run in secure mode, it will be a lot slower than interning.

Are you thinking that the 64-bit interned pointer would be used as the
hash? In this case all hashtables would have to be constructed at
runtime, which means it needs to be really, really cheap (well under a
milisecond, I'm sure Sage has >1000 classes, >10000 methods it imports
at startup). Also I'm not sure how the very-uneven distribution would
play out for constructing perfect hastables (perhaps it won't hurt,
there's likely to be long runs of consecutive values in some cases.

- Robert

From d.s.seljebotn at astro.uio.no  Sun Jun 10 10:00:36 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 10 Jun 2012 10:00:36 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
Message-ID: <4FD45424.9040909@astro.uio.no>

On 06/10/2012 09:34 AM, Robert Bradshaw wrote:
> On Sun, Jun 10, 2012 at 12:14 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>>
>>
>> Robert Bradshaw<robertwb at gmail.com>  wrote:
>>
>>> On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>  wrote:
>>>> I'd love to not do interning, but I see no way around it.
>>>
>>> No, I want to use the lower 64 bits by default, but always have the
>>> top 96 bits around to allow using this mechanism in "secure" mode at a
>>> slight penalty. md5 is out because there are known collisions. (Yes,
>>> sha-1 may succumb sooner rather than later, theoretical weaknesses
>>> have been shown, so we could look to using something else (hopefully
>>> still shipped with Python).
>>
>> But very few users are going to know about this. What's the odds that the user who decide to trigger JIT-compilation with function signatures that varies based on the input will know about the option and turn it on and also recompile all his/her C extension modules?
>>
>> In practice, such an option would always stay at its default value. If we leave it to secure by default and start teaching it to users from the start...but that's a big price to pay.
>
> Yes, it's not ideal from this perspective.
>
>> And if you *do* want to run in secure mode, it will be a lot slower than interning.
>
> Are you thinking that the 64-bit interned pointer would be used as the
> hash? In this case all hashtables would have to be constructed at
> runtime, which means it needs to be really, really cheap (well under a
> milisecond, I'm sure Sage has>1000 classes,>10000 methods it imports
> at startup). Also I'm not sure how the very-uneven distribution would
> play out for constructing perfect hastables (perhaps it won't hurt,
> there's likely to be long runs of consecutive values in some cases.

No, I'm thinking that callsites need both the 64-bit interned char* and 
the 64-bit hash of the *contents*. They use the hash to figure out the 
position, then compare by ID.

The hash is not stored in callees, it's discarded after figuring out the 
table layout.

(There was this idea that if the char* has least significant bit set, 
we'd hash it directly rather than dereference it, but let's ignore that 
for now.)

I don't think under a millisecond is unfeasible to hash smallish tables 
-- we could put the pointer through a cheap hash to create more entropy 
(for the perfect hashing, being able to select a hash function through 
the >>r is important, so you can't just use the pointer directly -- but 
there are functions cheaper than md5, e.g, in here: 
http://code.google.com/p/ulib/)

That would save us a register and make the instructions shorter in some 
places I guess...I think it's really miniscule, it's not like the effect 
of load of a global variable. But if you like this approach I can 
benchmark C-written hashtable creation and see.

Dag

From robertwb at gmail.com  Sun Jun 10 10:23:47 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Sun, 10 Jun 2012 01:23:47 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD45424.9040909@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
Message-ID: <CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>

On Sun, Jun 10, 2012 at 1:00 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/10/2012 09:34 AM, Robert Bradshaw wrote:
>>
>> On Sun, Jun 10, 2012 at 12:14 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>>
>>>
>>> Robert Bradshaw<robertwb at gmail.com> ?wrote:
>>>
>>>> On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>>>
>>>>> I'd love to not do interning, but I see no way around it.
>>>>
>>>>
>>>> No, I want to use the lower 64 bits by default, but always have the
>>>> top 96 bits around to allow using this mechanism in "secure" mode at a
>>>> slight penalty. md5 is out because there are known collisions. (Yes,
>>>> sha-1 may succumb sooner rather than later, theoretical weaknesses
>>>> have been shown, so we could look to using something else (hopefully
>>>> still shipped with Python).
>>>
>>>
>>> But very few users are going to know about this. What's the odds that the
>>> user who decide to trigger JIT-compilation with function signatures that
>>> varies based on the input will know about the option and turn it on and also
>>> recompile all his/her C extension modules?
>>>
>>> In practice, such an option would always stay at its default value. If we
>>> leave it to secure by default and start teaching it to users from the
>>> start...but that's a big price to pay.
>>
>>
>> Yes, it's not ideal from this perspective.
>>
>>> And if you *do* want to run in secure mode, it will be a lot slower than
>>> interning.
>>
>>
>> Are you thinking that the 64-bit interned pointer would be used as the
>> hash? In this case all hashtables would have to be constructed at
>> runtime, which means it needs to be really, really cheap (well under a
>> milisecond, I'm sure Sage has>1000 classes,>10000 methods it imports
>> at startup). Also I'm not sure how the very-uneven distribution would
>> play out for constructing perfect hastables (perhaps it won't hurt,
>> there's likely to be long runs of consecutive values in some cases.
>
>
> No, I'm thinking that callsites need both the 64-bit interned char* and the
> 64-bit hash of the *contents*. They use the hash to figure out the position,
> then compare by ID.

Ah, I missed that bit. OK, yes, that could work well.

> The hash is not stored in callees, it's discarded after figuring out the
> table layout.
>
> (There was this idea that if the char* has least significant bit set, we'd
> hash it directly rather than dereference it, but let's ignore that for now.)

(For the purpose of this discussion, it's part of the "interning" step.)

> I don't think under a millisecond is unfeasible to hash smallish tables --
> we could put the pointer through a cheap hash to create more entropy (for
> the perfect hashing, being able to select a hash function through the >>r is
> important, so you can't just use the pointer directly -- but there are
> functions cheaper than md5, e.g, in here: http://code.google.com/p/ulib/)

Just a sec, we're not hashing pointers, but the full signature itself,
right? For our hash function we need

(1) Collision free on 64-bits (for non-malicious use).
(2) Good distribution (including for short strings, which is harder to come by).
(2b) Any small subset of bits should have property (2).
(3) Ideally easy to reference (e.g. "md5" is better than "these 100
lines of C code").

Cheap runtime construction is still ideal, but much less of an issue
if hashes (and perfect tables) can be constructed at compile time,
which I think this scheme allows.

> That would save us a register and make the instructions shorter in some
> places I guess...I think it's really miniscule, it's not like the effect of
> load of a global variable. But if you like this approach I can benchmark
> C-written hashtable creation and see.

This will have value in and of itself (both the implementation and the
benchmarks).

- Robert

From d.s.seljebotn at astro.uio.no  Sun Jun 10 10:26:55 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 10 Jun 2012 10:26:55 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD45424.9040909@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
Message-ID: <0c97966b-4c3a-4577-9673-726a31c49e23@email.android.com>


Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:

>On 06/10/2012 09:34 AM, Robert Bradshaw wrote:
>> On Sun, Jun 10, 2012 at 12:14 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>  wrote:
>>>
>>>
>>> Robert Bradshaw<robertwb at gmail.com>  wrote:
>>>
>>>> On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no>  wrote:
>>>>> I'd love to not do interning, but I see no way around it.
>>>>
>>>> No, I want to use the lower 64 bits by default, but always have the
>>>> top 96 bits around to allow using this mechanism in "secure" mode
>at a
>>>> slight penalty. md5 is out because there are known collisions.
>(Yes,
>>>> sha-1 may succumb sooner rather than later, theoretical weaknesses
>>>> have been shown, so we could look to using something else
>(hopefully
>>>> still shipped with Python).
>>>
>>> But very few users are going to know about this. What's the odds
>that the user who decide to trigger JIT-compilation with function
>signatures that varies based on the input will know about the option
>and turn it on and also recompile all his/her C extension modules?
>>>
>>> In practice, such an option would always stay at its default value.
>If we leave it to secure by default and start teaching it to users from
>the start...but that's a big price to pay.
>>
>> Yes, it's not ideal from this perspective.
>>
>>> And if you *do* want to run in secure mode, it will be a lot slower
>than interning.
>>
>> Are you thinking that the 64-bit interned pointer would be used as
>the
>> hash? In this case all hashtables would have to be constructed at
>> runtime, which means it needs to be really, really cheap (well under
>a
>> milisecond, I'm sure Sage has>1000 classes,>10000 methods it imports
>> at startup). Also I'm not sure how the very-uneven distribution would
>> play out for constructing perfect hastables (perhaps it won't hurt,
>> there's likely to be long runs of consecutive values in some cases.
>
>No, I'm thinking that callsites need both the 64-bit interned char* and
>
>the 64-bit hash of the *contents*. They use the hash to figure out the 
>position, then compare by ID.
>
>The hash is not stored in callees, it's discarded after figuring out
>the 
>table layout.
>
>(There was this idea that if the char* has least significant bit set, 
>we'd hash it directly rather than dereference it, but let's ignore that
>
>for now.)
>
>I don't think under a millisecond is unfeasible to hash smallish tables
>
>-- we could put the pointer through a cheap hash to create more entropy
>
>(for the perfect hashing, being able to select a hash function through 
>the >>r is important, so you can't just use the pointer directly -- but
>
>there are functions cheaper than md5, e.g, in here: 
>http://code.google.com/p/ulib/)
>
>That would save us a register and make the instructions shorter in some
>
>places I guess...I think it's really miniscule, it's not like the
>effect 
>of load of a global variable. But if you like this approach I can 
>benchmark C-written hashtable creation and see.


I don't know what I was thinking. Tha callsite can't hash every time, and the pointer doesn't contain enough entropy for perfect hashing, so hashing the pointer has only disadvantages.

I really think the call site should have both a hash and a separate interned ID. And if the caller knows the entry should be there, it can skip the ID check and only needs the hash.

That makes the table pretty slick for non-smart callers too, it would be (id, flags, ptr)-entries, and callers could either do strcmp or interning, with or without hashing. (I realize the information would be there in your proposal too, but this would be slimmer).

Dag


>
>Dag
>_______________________________________________
>cython-devel mailing list
>cython-devel at python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

From d.s.seljebotn at astro.uio.no  Sun Jun 10 10:43:29 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 10 Jun 2012 10:43:29 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
Message-ID: <4FD45E31.8060506@astro.uio.no>

On 06/10/2012 10:23 AM, Robert Bradshaw wrote:
> On Sun, Jun 10, 2012 at 1:00 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/10/2012 09:34 AM, Robert Bradshaw wrote:
>>>
>>> On Sun, Jun 10, 2012 at 12:14 AM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>
>>>>
>>>>
>>>> Robert Bradshaw<robertwb at gmail.com>    wrote:
>>>>
>>>>> On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
>>>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>>>
>>>>>> I'd love to not do interning, but I see no way around it.
>>>>>
>>>>>
>>>>> No, I want to use the lower 64 bits by default, but always have the
>>>>> top 96 bits around to allow using this mechanism in "secure" mode at a
>>>>> slight penalty. md5 is out because there are known collisions. (Yes,
>>>>> sha-1 may succumb sooner rather than later, theoretical weaknesses
>>>>> have been shown, so we could look to using something else (hopefully
>>>>> still shipped with Python).
>>>>
>>>>
>>>> But very few users are going to know about this. What's the odds that the
>>>> user who decide to trigger JIT-compilation with function signatures that
>>>> varies based on the input will know about the option and turn it on and also
>>>> recompile all his/her C extension modules?
>>>>
>>>> In practice, such an option would always stay at its default value. If we
>>>> leave it to secure by default and start teaching it to users from the
>>>> start...but that's a big price to pay.
>>>
>>>
>>> Yes, it's not ideal from this perspective.
>>>
>>>> And if you *do* want to run in secure mode, it will be a lot slower than
>>>> interning.
>>>
>>>
>>> Are you thinking that the 64-bit interned pointer would be used as the
>>> hash? In this case all hashtables would have to be constructed at
>>> runtime, which means it needs to be really, really cheap (well under a
>>> milisecond, I'm sure Sage has>1000 classes,>10000 methods it imports
>>> at startup). Also I'm not sure how the very-uneven distribution would
>>> play out for constructing perfect hastables (perhaps it won't hurt,
>>> there's likely to be long runs of consecutive values in some cases.
>>
>>
>> No, I'm thinking that callsites need both the 64-bit interned char* and the
>> 64-bit hash of the *contents*. They use the hash to figure out the position,
>> then compare by ID.
>
> Ah, I missed that bit. OK, yes, that could work well.

Ah, we've been talking past one another for some time then. OK, let's 
settle on that.

>
>> The hash is not stored in callees, it's discarded after figuring out the
>> table layout.
>>
>> (There was this idea that if the char* has least significant bit set, we'd
>> hash it directly rather than dereference it, but let's ignore that for now.)
>
> (For the purpose of this discussion, it's part of the "interning" step.)
>
>> I don't think under a millisecond is unfeasible to hash smallish tables --
>> we could put the pointer through a cheap hash to create more entropy (for
>> the perfect hashing, being able to select a hash function through the>>r is
>> important, so you can't just use the pointer directly -- but there are
>> functions cheaper than md5, e.g, in here: http://code.google.com/p/ulib/)
>
> Just a sec, we're not hashing pointers, but the full signature itself,
> right? For our hash function we need
>
> (1) Collision free on 64-bits (for non-malicious use).
> (2) Good distribution (including for short strings, which is harder to come by).
> (2b) Any small subset of bits should have property (2).
> (3) Ideally easy to reference (e.g. "md5" is better than "these 100
> lines of C code").
>
> Cheap runtime construction is still ideal, but much less of an issue
> if hashes (and perfect tables) can be constructed at compile time,
> which I think this scheme allows.

Yes, 64 bits of md5 then? ulib contains "100 lines of C code" for md5 
anyway, if one doesn't want to go through Python hashlib (I imagine e.g. 
hashlib might be unavailable somewhere as it relies on openssl and 
there's license war going on vs. gnutls and so on. And the md5 module is 
deprecated.).

>
>> That would save us a register and make the instructions shorter in some
>> places I guess...I think it's really miniscule, it's not like the effect of
>> load of a global variable. But if you like this approach I can benchmark
>> C-written hashtable creation and see.
>
> This will have value in and of itself (both the implementation and the
> benchmarks).

Will do (eventually, less spare time in coming week).

About signatures, a problem I see with following the C typing is that 
the signature "ill" wouldn't hash the same as "iii" on 32-bit Windows 
and "iqq" on 32-bit Linux, and so on. I think that would be really bad. 
"l" must be banished -- but then one might as well do "i4i8i8".

Designing a signature hash where you select between these at 
compile-time is perhaps doable but does generate a lot of code and makes 
everything complicated. I think we should just start off with hashing at 
module load time when sizes are known, and then work with heuristics 
and/or build system integration to improve on that afterwards.


Dag

From robertwb at gmail.com  Sun Jun 10 11:53:12 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Sun, 10 Jun 2012 02:53:12 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD45E31.8060506@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCD20DC.6090906@astro.uio.no>
	<CADiQ+QBwXCeo-Vxx-oWHHJUG_9ak+_xZ3kPYc9pJDUMHBFaKzQ@mail.gmail.com>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
Message-ID: <CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>

On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/10/2012 10:23 AM, Robert Bradshaw wrote:
>>
>> On Sun, Jun 10, 2012 at 1:00 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> On 06/10/2012 09:34 AM, Robert Bradshaw wrote:
>>>>
>>>>
>>>> On Sun, Jun 10, 2012 at 12:14 AM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> ? ?wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Robert Bradshaw<robertwb at gmail.com> ? ?wrote:
>>>>>
>>>>>> On Fri, Jun 8, 2012 at 10:45 PM, Dag Sverre Seljebotn
>>>>>> <d.s.seljebotn at astro.uio.no> ? ?wrote:
>>>>>>>
>>>>>>>
>>>>>>> I'd love to not do interning, but I see no way around it.
>>>>>>
>>>>>>
>>>>>>
>>>>>> No, I want to use the lower 64 bits by default, but always have the
>>>>>> top 96 bits around to allow using this mechanism in "secure" mode at a
>>>>>> slight penalty. md5 is out because there are known collisions. (Yes,
>>>>>> sha-1 may succumb sooner rather than later, theoretical weaknesses
>>>>>> have been shown, so we could look to using something else (hopefully
>>>>>> still shipped with Python).
>>>>>
>>>>>
>>>>>
>>>>> But very few users are going to know about this. What's the odds that
>>>>> the
>>>>> user who decide to trigger JIT-compilation with function signatures
>>>>> that
>>>>> varies based on the input will know about the option and turn it on and
>>>>> also
>>>>> recompile all his/her C extension modules?
>>>>>
>>>>> In practice, such an option would always stay at its default value. If
>>>>> we
>>>>> leave it to secure by default and start teaching it to users from the
>>>>> start...but that's a big price to pay.
>>>>
>>>>
>>>>
>>>> Yes, it's not ideal from this perspective.
>>>>
>>>>> And if you *do* want to run in secure mode, it will be a lot slower
>>>>> than
>>>>> interning.
>>>>
>>>>
>>>>
>>>> Are you thinking that the 64-bit interned pointer would be used as the
>>>> hash? In this case all hashtables would have to be constructed at
>>>> runtime, which means it needs to be really, really cheap (well under a
>>>> milisecond, I'm sure Sage has>1000 classes,>10000 methods it imports
>>>> at startup). Also I'm not sure how the very-uneven distribution would
>>>> play out for constructing perfect hastables (perhaps it won't hurt,
>>>> there's likely to be long runs of consecutive values in some cases.
>>>
>>>
>>>
>>> No, I'm thinking that callsites need both the 64-bit interned char* and
>>> the
>>> 64-bit hash of the *contents*. They use the hash to figure out the
>>> position,
>>> then compare by ID.
>>
>>
>> Ah, I missed that bit. OK, yes, that could work well.
>
>
> Ah, we've been talking past one another for some time then. OK, let's settle
> on that.
>
>
>>
>>> The hash is not stored in callees, it's discarded after figuring out the
>>> table layout.
>>>
>>> (There was this idea that if the char* has least significant bit set,
>>> we'd
>>> hash it directly rather than dereference it, but let's ignore that for
>>> now.)
>>
>>
>> (For the purpose of this discussion, it's part of the "interning" step.)
>>
>>> I don't think under a millisecond is unfeasible to hash smallish tables
>>> --
>>> we could put the pointer through a cheap hash to create more entropy (for
>>> the perfect hashing, being able to select a hash function through the>>r
>>> is
>>> important, so you can't just use the pointer directly -- but there are
>>> functions cheaper than md5, e.g, in here: http://code.google.com/p/ulib/)
>>
>>
>> Just a sec, we're not hashing pointers, but the full signature itself,
>> right? For our hash function we need
>>
>> (1) Collision free on 64-bits (for non-malicious use).
>> (2) Good distribution (including for short strings, which is harder to
>> come by).
>> (2b) Any small subset of bits should have property (2).
>> (3) Ideally easy to reference (e.g. "md5" is better than "these 100
>> lines of C code").
>>
>> Cheap runtime construction is still ideal, but much less of an issue
>> if hashes (and perfect tables) can be constructed at compile time,
>> which I think this scheme allows.
>
>
> Yes, 64 bits of md5 then?

+1 for me.

> ulib contains "100 lines of C code" for md5
> anyway, if one doesn't want to go through Python hashlib (I imagine e.g.
> hashlib might be unavailable somewhere as it relies on openssl and there's
> license war going on vs. gnutls and so on. And the md5 module is
> deprecated.).

Just the interface, right? (hashlib should be used instead...)

>>> That would save us a register and make the instructions shorter in some
>>> places I guess...I think it's really miniscule, it's not like the effect
>>> of
>>> load of a global variable. But if you like this approach I can benchmark
>>> C-written hashtable creation and see.
>>
>>
>> This will have value in and of itself (both the implementation and the
>> benchmarks).
>
>
> Will do (eventually, less spare time in coming week).
>
> About signatures, a problem I see with following the C typing is that the
> signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and "iqq"
> on 32-bit Linux, and so on. I think that would be really bad.

This is why I suggested promotion for scalars (divide ints into
<=sizeof(long) and sizeof(long) < x <= sizeof(long long)), checked at
C compile time, though I guess you consider that evil. I don't
consider not matching really bad, just kind of bad.

> "l" must be banished -- but then one might as well do "i4i8i8".
>
> Designing a signature hash where you select between these at compile-time is
> perhaps doable but does generate a lot of code and makes everything
> complicated.

It especially gets messy when you're trying to pre-compute tables.

> I think we should just start off with hashing at module load
> time when sizes are known, and then work with heuristics and/or build system
> integration to improve on that afterwards.

Finding 10,000 optimal tables at runtime better be really cheap than
for Sage's sake :).

- Robert

From pav at iki.fi  Mon Jun 11 21:08:33 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 11 Jun 2012 21:08:33 +0200
Subject: [Cython] Failure with asarray(memoryview) on Python 2.4
Message-ID: <jr5fni$6dd$1@dough.gmane.org>

Hi,

This doesn't work on Python 2.4 (works on >= 2.5):

------------
cimport numpy as np
import numpy as np

def _foo():
    cdef double[:] a
    a = np.array([1.0])
    return np.asarray(a)

def foo():
    print _foo()
------------

Spotted when using Cython 1.6 in Scipy. Results to:

Python 2.4.6 (#1, Nov 20 2010, 00:52:41)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import fail
>>> fail.foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "fail.pyx", line 10, in fail.foo (fail.c:1776)
    print _foo()
  File "fail.pyx", line 7, in fail._foo (fail.c:1715)
    return np.asarray(a)
  File
"/usr/local/stow/python-easy-install//lib/python2.4/site-packages/numpy/core/numeric.py",
line 235, in asarray
    return array(a, dtype, copy=False, order=order)
  File "stringsource", line 366, in
View.MemoryView.memoryview.__getitem__ (fail.c:5975)
  File "stringsource", line 650, in View.MemoryView._unellipsify
(fail.c:9236)
TypeError: Cannot index with type '<type 'int'>'


From pav at iki.fi  Mon Jun 11 21:12:38 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 11 Jun 2012 21:12:38 +0200
Subject: [Cython] Cython 1.6 & compilation failure on MinGW?
Message-ID: <jr5fv8$8d4$1@dough.gmane.org>

Hi,

We ran with Scipy to a compilation failure on MinGW in Cython code:

http://projects.scipy.org/scipy/ticket/1673

interpnd.c:10580: error: initializer element is not constant
interpnd.c:10580: error: (near initialization for
`__pyx_CyFunctionType_type.tp_call')

Can be fixed like this:

...
+static PyObject *__Pyx_PyCFunction_Call_wrap(PyObject *a, PyObject *b,
PyObject *c)
+{
+    return __Pyx_PyCFunction_Call(a, b, c);
+}
 static PyTypeObject __pyx_CyFunctionType_type = {
     PyVarObject_HEAD_INIT(0, 0)
     __Pyx_NAMESTR("cython_function_or_method"),
@@ -10577,7 +10581,7 @@ static PyTypeObject __pyx_CyFunctionType_type = {
     0,
     0,
     0,
-    __Pyx_PyCFunction_Call,
+    __Pyx_PyCFunction_Call_wrap,
     0,
     0,
     0,
...


It's a bit surprising to me that you cannot use the function from the
Python headers as a static initializer on that platform...

-- 
Pauli Virtanen


From markflorisson88 at gmail.com  Mon Jun 11 21:16:37 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 11 Jun 2012 20:16:37 +0100
Subject: [Cython] Failure with asarray(memoryview) on Python 2.4
In-Reply-To: <jr5fni$6dd$1@dough.gmane.org>
References: <jr5fni$6dd$1@dough.gmane.org>
Message-ID: <CANg26EW+BzwUSgt=0R+KnWRDs3cMRbhAgbp=MmY67D2sGARj+A@mail.gmail.com>

On 11 June 2012 20:08, Pauli Virtanen <pav at iki.fi> wrote:
> Hi,
>
> This doesn't work on Python 2.4 (works on >= 2.5):
>
> ------------
> cimport numpy as np
> import numpy as np
>
> def _foo():
> ? ?cdef double[:] a
> ? ?a = np.array([1.0])
> ? ?return np.asarray(a)
>
> def foo():
> ? ?print _foo()
> ------------
>
> Spotted when using Cython 1.6 in Scipy. Results to:
>
> Python 2.4.6 (#1, Nov 20 2010, 00:52:41)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import fail
>>>> fail.foo()
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in ?
> ?File "fail.pyx", line 10, in fail.foo (fail.c:1776)
> ? ?print _foo()
> ?File "fail.pyx", line 7, in fail._foo (fail.c:1715)
> ? ?return np.asarray(a)
> ?File
> "/usr/local/stow/python-easy-install//lib/python2.4/site-packages/numpy/core/numeric.py",
> line 235, in asarray
> ? ?return array(a, dtype, copy=False, order=order)
> ?File "stringsource", line 366, in
> View.MemoryView.memoryview.__getitem__ (fail.c:5975)
> ?File "stringsource", line 650, in View.MemoryView._unellipsify
> (fail.c:9236)
> TypeError: Cannot index with type '<type 'int'>'
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Hey Pauli,

Yeah, there was some weird bug with PyIndex_Check() not operating
properly. Could you retry with the latest master?

Mark

From markflorisson88 at gmail.com  Mon Jun 11 21:17:49 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 11 Jun 2012 20:17:49 +0100
Subject: [Cython] Cython 1.6 & compilation failure on MinGW?
In-Reply-To: <jr5fv8$8d4$1@dough.gmane.org>
References: <jr5fv8$8d4$1@dough.gmane.org>
Message-ID: <CANg26EWxjQGqZW=_kAnrwB-Z0uXnV+DbaMVAENZtqJLqmaxwoQ@mail.gmail.com>

On 11 June 2012 20:12, Pauli Virtanen <pav at iki.fi> wrote:
> Hi,
>
> We ran with Scipy to a compilation failure on MinGW in Cython code:
>
> http://projects.scipy.org/scipy/ticket/1673
>
> interpnd.c:10580: error: initializer element is not constant
> interpnd.c:10580: error: (near initialization for
> `__pyx_CyFunctionType_type.tp_call')
>
> Can be fixed like this:
>
> ...
> +static PyObject *__Pyx_PyCFunction_Call_wrap(PyObject *a, PyObject *b,
> PyObject *c)
> +{
> + ? ?return __Pyx_PyCFunction_Call(a, b, c);
> +}
> ?static PyTypeObject __pyx_CyFunctionType_type = {
> ? ? PyVarObject_HEAD_INIT(0, 0)
> ? ? __Pyx_NAMESTR("cython_function_or_method"),
> @@ -10577,7 +10581,7 @@ static PyTypeObject __pyx_CyFunctionType_type = {
> ? ? 0,
> ? ? 0,
> ? ? 0,
> - ? ?__Pyx_PyCFunction_Call,
> + ? ?__Pyx_PyCFunction_Call_wrap,
> ? ? 0,
> ? ? 0,
> ? ? 0,
> ...
>
>
> It's a bit surprising to me that you cannot use the function from the
> Python headers as a static initializer on that platform...
>
> --
> Pauli Virtanen
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Thanks, could you provide a pull request? That makes it easier to
merge and assign credit.

From pav at iki.fi  Mon Jun 11 21:23:53 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 11 Jun 2012 21:23:53 +0200
Subject: [Cython] Failure with asarray(memoryview) on Python 2.4
In-Reply-To: <CANg26EW+BzwUSgt=0R+KnWRDs3cMRbhAgbp=MmY67D2sGARj+A@mail.gmail.com>
References: <jr5fni$6dd$1@dough.gmane.org>
	<CANg26EW+BzwUSgt=0R+KnWRDs3cMRbhAgbp=MmY67D2sGARj+A@mail.gmail.com>
Message-ID: <jr5gka$dqa$1@dough.gmane.org>

Hi,

11.06.2012 21:16, mark florisson kirjoitti:
[clip]
> Yeah, there was some weird bug with PyIndex_Check() not operating
> properly. Could you retry with the latest master?

Doesn't seem to work in 5a0effd0  :(

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "fail.pyx", line 10, in fail.foo (fail.c:1807)
    print _foo()
  File "fail.pyx", line 7, in fail._foo (fail.c:1747)
    return np.asarray(a)
  File
"/usr/local/stow/python-easy-install//lib/python2.4/site-packages/numpy/core/numeric.py",
line 235, in asarray
    return array(a, dtype, copy=False, order=order)
  File "stringsource", line 366, in
View.MemoryView.memoryview.__getitem__ (fail.c:6019)
  File "stringsource", line 650, in View.MemoryView._unellipsify
(fail.c:9199)
TypeError: Cannot index with type '<type 'int'>'


Cheers,
Pauli


From pav at iki.fi  Mon Jun 11 21:24:56 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 11 Jun 2012 21:24:56 +0200
Subject: [Cython] Cython 1.6 & compilation failure on MinGW?
In-Reply-To: <CANg26EWxjQGqZW=_kAnrwB-Z0uXnV+DbaMVAENZtqJLqmaxwoQ@mail.gmail.com>
References: <jr5fv8$8d4$1@dough.gmane.org>
	<CANg26EWxjQGqZW=_kAnrwB-Z0uXnV+DbaMVAENZtqJLqmaxwoQ@mail.gmail.com>
Message-ID: <jr5gm8$dqa$2@dough.gmane.org>

11.06.2012 21:17, mark florisson kirjoitti:
[clip]
> Thanks, could you provide a pull request? That makes it easier to
> merge and assign credit.

Ok, I'll try to not only just complain :)

BRB,

	Pauli


From pav at iki.fi  Mon Jun 11 21:27:18 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 11 Jun 2012 21:27:18 +0200
Subject: [Cython] Cython 1.6 & compilation failure on MinGW?
In-Reply-To: <CANg26EWxjQGqZW=_kAnrwB-Z0uXnV+DbaMVAENZtqJLqmaxwoQ@mail.gmail.com>
References: <jr5fv8$8d4$1@dough.gmane.org>
	<CANg26EWxjQGqZW=_kAnrwB-Z0uXnV+DbaMVAENZtqJLqmaxwoQ@mail.gmail.com>
Message-ID: <jr5gqn$dqa$3@dough.gmane.org>

11.06.2012 21:17, mark florisson kirjoitti:
[clip]
> Thanks, could you provide a pull request? That makes it easier to
> merge and assign credit.

Ok, this one seemed to already have been fixed in Cython master.

	Pauli


From pav at iki.fi  Mon Jun 11 21:55:57 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 11 Jun 2012 21:55:57 +0200
Subject: [Cython] Failure with asarray(memoryview) on Python 2.4
In-Reply-To: <jr5gka$dqa$1@dough.gmane.org>
References: <jr5fni$6dd$1@dough.gmane.org>
	<CANg26EW+BzwUSgt=0R+KnWRDs3cMRbhAgbp=MmY67D2sGARj+A@mail.gmail.com>
	<jr5gka$dqa$1@dough.gmane.org>
Message-ID: <jr5ige$tfa$1@dough.gmane.org>

11.06.2012 21:23, Pauli Virtanen kirjoitti:
> Hi,
> 
> 11.06.2012 21:16, mark florisson kirjoitti:
> [clip]
>> Yeah, there was some weird bug with PyIndex_Check() not operating
>> properly. Could you retry with the latest master?
> 
> Doesn't seem to work in 5a0effd0  :(
[clip]

Uhh, Numpy header arrayobject.h -> npy_common.h contains this

#if (PY_VERSION_HEX < 0x02050000)
...
#undef PyIndex_Check
#define PyIndex_Check(op) 0
...

which nicely overrides the fixed PyIndex_Check defined by Cython.
Time to fix that, I guess.

I don't see reasonable ways to work around this in Cython...

	Pauli


From markflorisson88 at gmail.com  Mon Jun 11 22:02:26 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 11 Jun 2012 21:02:26 +0100
Subject: [Cython] Failure with asarray(memoryview) on Python 2.4
In-Reply-To: <jr5ige$tfa$1@dough.gmane.org>
References: <jr5fni$6dd$1@dough.gmane.org>
	<CANg26EW+BzwUSgt=0R+KnWRDs3cMRbhAgbp=MmY67D2sGARj+A@mail.gmail.com>
	<jr5gka$dqa$1@dough.gmane.org> <jr5ige$tfa$1@dough.gmane.org>
Message-ID: <CANg26EUYN_+y_QfhQVt2f1L7ACFH1MrrCb+60rE63wgXVA+8EQ@mail.gmail.com>

On 11 June 2012 20:55, Pauli Virtanen <pav at iki.fi> wrote:
> 11.06.2012 21:23, Pauli Virtanen kirjoitti:
>> Hi,
>>
>> 11.06.2012 21:16, mark florisson kirjoitti:
>> [clip]
>>> Yeah, there was some weird bug with PyIndex_Check() not operating
>>> properly. Could you retry with the latest master?
>>
>> Doesn't seem to work in 5a0effd0 ?:(
> [clip]
>
> Uhh, Numpy header arrayobject.h -> npy_common.h contains this
>
> #if (PY_VERSION_HEX < 0x02050000)
> ...
> #undef PyIndex_Check
> #define PyIndex_Check(op) 0
> ...
>
> which nicely overrides the fixed PyIndex_Check defined by Cython.
> Time to fix that, I guess.
>
> I don't see reasonable ways to work around this in Cython...
>
> ? ? ? ?Pauli
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Ah, thanks! Stefan and I were kind of baffled by PyIndex_Check
failing, I guess we should have run cpp on our source :)

From d.s.seljebotn at astro.uio.no  Tue Jun 12 13:01:45 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 12 Jun 2012 13:01:45 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no>
	<048eeb04-aa8b-4e12-9a9b-5d552d39984b@email.android.com>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
Message-ID: <4FD72199.7010803@astro.uio.no>

On 06/10/2012 11:53 AM, Robert Bradshaw wrote:
> On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn
>> About signatures, a problem I see with following the C typing is that the
>> signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and "iqq"
>> on 32-bit Linux, and so on. I think that would be really bad.
>
> This is why I suggested promotion for scalars (divide ints into
> <=sizeof(long) and sizeof(long)<  x<= sizeof(long long)), checked at
> C compile time, though I guess you consider that evil. I don't
> consider not matching really bad, just kind of bad.

Right. At least a convention for promotion of scalars would be good anyway.

Even MSVC supports stdint.h these days; basing ourselves on the random 
behaviour of "long" seems a bit outdated to me. "ssize_t" would be 
better motivated I feel.

Many linear algebra libraries use 32-bit matrix indices by default, but 
can be swapped to 64-bit indices (this holds for many LAPACK 
implementations and most sparse linear algebra). So often there will at 
least be one typedef that is either 32 bits or 64 bits without the 
Cython compiler knowing.

Promoting to a single type "[u]int64" is the only one that removes 
possible combinatorial explosion if you have multiple external typedefs 
that you don't know the size of (although I guess that's rather 
theoretical).

Anyway, runtime table generation is quite fast, see below.

>
>> "l" must be banished -- but then one might as well do "i4i8i8".
>>
>> Designing a signature hash where you select between these at compile-time is
>> perhaps doable but does generate a lot of code and makes everything
>> complicated.
>
> It especially gets messy when you're trying to pre-compute tables.
>
>> I think we should just start off with hashing at module load
>> time when sizes are known, and then work with heuristics and/or build system
>> integration to improve on that afterwards.
>
> Finding 10,000 optimal tables at runtime better be really cheap than
> for Sage's sake :).

The code is highly unpolished as I write this, but it works so here's 
some preliminary benchmarks.

Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot 
table varies between 5 and 10 us (microseconds) depending on the set of 
hashes.

Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns 
per hash, so 26 us for the 64-slot table => it dominates!

The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table. 
Admittedly, that's with hand-written non-portable assembly for the 
crapwow64.

Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4 
seconds for loading Sage using md5, or 0.1 seconds using crapwow64.

https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h

http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html

Dag

From d.s.seljebotn at astro.uio.no  Tue Jun 12 19:21:48 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 12 Jun 2012 19:21:48 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD72199.7010803@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no>
Message-ID: <4FD77AAC.6080905@astro.uio.no>

On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote:
> On 06/10/2012 11:53 AM, Robert Bradshaw wrote:
>> On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn
>>> About signatures, a problem I see with following the C typing is that
>>> the
>>> signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and
>>> "iqq"
>>> on 32-bit Linux, and so on. I think that would be really bad.
>>
>> This is why I suggested promotion for scalars (divide ints into
>> <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at
>> C compile time, though I guess you consider that evil. I don't
>> consider not matching really bad, just kind of bad.
>
> Right. At least a convention for promotion of scalars would be good anyway.
>
> Even MSVC supports stdint.h these days; basing ourselves on the random
> behaviour of "long" seems a bit outdated to me. "ssize_t" would be
> better motivated I feel.
>
> Many linear algebra libraries use 32-bit matrix indices by default, but
> can be swapped to 64-bit indices (this holds for many LAPACK
> implementations and most sparse linear algebra). So often there will at
> least be one typedef that is either 32 bits or 64 bits without the
> Cython compiler knowing.
>
> Promoting to a single type "[u]int64" is the only one that removes
> possible combinatorial explosion if you have multiple external typedefs
> that you don't know the size of (although I guess that's rather
> theoretical).
>
> Anyway, runtime table generation is quite fast, see below.
>
>>
>>> "l" must be banished -- but then one might as well do "i4i8i8".
>>>
>>> Designing a signature hash where you select between these at
>>> compile-time is
>>> perhaps doable but does generate a lot of code and makes everything
>>> complicated.
>>
>> It especially gets messy when you're trying to pre-compute tables.
>>
>>> I think we should just start off with hashing at module load
>>> time when sizes are known, and then work with heuristics and/or build
>>> system
>>> integration to improve on that afterwards.
>>
>> Finding 10,000 optimal tables at runtime better be really cheap than
>> for Sage's sake :).
>
> The code is highly unpolished as I write this, but it works so here's
> some preliminary benchmarks.
>
> Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot
> table varies between 5 and 10 us (microseconds) depending on the set of
> hashes.
>
> Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns
> per hash, so 26 us for the 64-slot table => it dominates!
>
> The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table.
> Admittedly, that's with hand-written non-portable assembly for the
> crapwow64.
>
> Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4
> seconds for loading Sage using md5, or 0.1 seconds using crapwow64.
>
> https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h
>
> http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html

Look: A big advantage of the hash-vtables is that subclasses stay 
ABI-compatible with superclasses, and don't need recompilation when 
superclasses adds or removes methods.

=> Finding the hash table must happen at run-time in a lot of cases 
anyway, so I feel Robert's chase for a compile-time table building is moot.

I feel this would also need to trigger automatically heap-allocated 
tables if the statically allocated. Which is good to have in the very 
few cases where a perfect table can't be found too.

One thing is that, which makes me feel uneasy about the relatively 
unexplored crapwow64 is that we really don't want collisions in the 
64-bit prehashes within a single table (which would raise an exception 
-- which I think is OK from a security perspective, you can always have 
a MemoryError at any point too, so programmers should not expose class 
creation to attackers without being able to deal with it failing).

For the record, I found another md5 implementation that's a bit faster; 
first one is "sphlib" and second is "ulib":

In [2]: %timeit extensibletype.extensibletype.md5bench2(10**3)
1000 loops, best of 3: 237 us per loop

In [3]: %timeit extensibletype.extensibletype.md5bench(10**3)
1000 loops, best of 3: 374 us per loop

http://www.saphir2.com/sphlib/

It's really only for extremely large projects like Sage where this can 
be noticed in any way.

Dag

From robertwb at gmail.com  Tue Jun 12 20:12:02 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Tue, 12 Jun 2012 11:12:02 -0700
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD77AAC.6080905@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no>
	<CADiQ+QCfcCrAij-BeijE_rpmys95Ckx+pf2XnA6tEv9jKjw0tQ@mail.gmail.com>
	<4FCFC088.3000709@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no> <4FD77AAC.6080905@astro.uio.no>
Message-ID: <CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>

On Tue, Jun 12, 2012 at 10:21 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote:
>>
>> On 06/10/2012 11:53 AM, Robert Bradshaw wrote:
>>>
>>> On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn
>>>>
>>>> About signatures, a problem I see with following the C typing is that
>>>> the
>>>> signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and
>>>> "iqq"
>>>> on 32-bit Linux, and so on. I think that would be really bad.
>>>
>>>
>>> This is why I suggested promotion for scalars (divide ints into
>>> <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at
>>> C compile time, though I guess you consider that evil. I don't
>>> consider not matching really bad, just kind of bad.
>>
>>
>> Right. At least a convention for promotion of scalars would be good
>> anyway.
>>
>> Even MSVC supports stdint.h these days; basing ourselves on the random
>> behaviour of "long" seems a bit outdated to me. "ssize_t" would be
>> better motivated I feel.
>>
>> Many linear algebra libraries use 32-bit matrix indices by default, but
>> can be swapped to 64-bit indices (this holds for many LAPACK
>> implementations and most sparse linear algebra). So often there will at
>> least be one typedef that is either 32 bits or 64 bits without the
>> Cython compiler knowing.
>>
>> Promoting to a single type "[u]int64" is the only one that removes
>> possible combinatorial explosion if you have multiple external typedefs
>> that you don't know the size of (although I guess that's rather
>> theoretical).
>>
>> Anyway, runtime table generation is quite fast, see below.
>>
>>>
>>>> "l" must be banished -- but then one might as well do "i4i8i8".
>>>>
>>>> Designing a signature hash where you select between these at
>>>> compile-time is
>>>> perhaps doable but does generate a lot of code and makes everything
>>>> complicated.
>>>
>>>
>>> It especially gets messy when you're trying to pre-compute tables.
>>>
>>>> I think we should just start off with hashing at module load
>>>> time when sizes are known, and then work with heuristics and/or build
>>>> system
>>>> integration to improve on that afterwards.
>>>
>>>
>>> Finding 10,000 optimal tables at runtime better be really cheap than
>>> for Sage's sake :).
>>
>>
>> The code is highly unpolished as I write this, but it works so here's
>> some preliminary benchmarks.
>>
>> Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot
>> table varies between 5 and 10 us (microseconds) depending on the set of
>> hashes.
>>
>> Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns
>> per hash, so 26 us for the 64-slot table => it dominates!
>>
>> The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table.
>> Admittedly, that's with hand-written non-portable assembly for the
>> crapwow64.
>>
>> Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4
>> seconds for loading Sage using md5, or 0.1 seconds using crapwow64.
>>
>>
>> https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h
>>
>> http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html
>
>
> Look: A big advantage of the hash-vtables is that subclasses stay
> ABI-compatible with superclasses, and don't need recompilation when
> superclasses adds or removes methods.
>
> => Finding the hash table must happen at run-time in a lot of cases anyway,
> so I feel Robert's chase for a compile-time table building is moot.
>
> I feel this would also need to trigger automatically heap-allocated tables
> if the statically allocated. Which is good to have in the very few cases
> where a perfect table can't be found too.

Finding the hash table at runtime should be supported, but the *vast*
majority of methods sets is known at compile time. 0.4 seconds is a
huge overhead to just add to Sage (yes, it's an exception, but an
important one), and though crapwow64 helps I'd rather rely on a known,
good standard hash. I need to actually look at Sage to see what the
impact would be. Also, most tables would probably have 2 entries in
them (e.g. a typed one and an all-object one).

long int will continue to be an important type as long as it's the
default for int literals and Python's "fast" ints (whether in type or
implementation), so we can't just move to stdint. I also don't like
that the form of the table (and whether certain signatures match)
being platform-dependent: the less variance we have from one platform
to the next is better.

On an orthogonal note, sizeof(long)-sensitive tables need not be
entirely at odds with compile-time table compilation, as most
functions will probably have 0 or 1 parameters that are of unknown
size, so we could spit out 1 or 2 statically compiled tables and do
generate the rest on the fly. I still would rather have fixed
Cython-compile time tables though.

> One thing is that, which makes me feel uneasy about the relatively
> unexplored crapwow64 is that we really don't want collisions in the 64-bit
> prehashes within a single table (which would raise an exception -- which I
> think is OK from a security perspective, you can always have a MemoryError
> at any point too, so programmers should not expose class creation to
> attackers without being able to deal with it failing).
>
> For the record, I found another md5 implementation that's a bit faster;
> first one is "sphlib" and second is "ulib":
>
> In [2]: %timeit extensibletype.extensibletype.md5bench2(10**3)
> 1000 loops, best of 3: 237 us per loop
>
> In [3]: %timeit extensibletype.extensibletype.md5bench(10**3)
> 1000 loops, best of 3: 374 us per loop
>
> http://www.saphir2.com/sphlib/
>
> It's really only for extremely large projects like Sage where this can be
> noticed in any way.
>
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Tue Jun 12 16:13:03 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 12 Jun 2012 16:13:03 +0200
Subject: [Cython] "__pyx_dynamic_args" undeclared in fused types code
Message-ID: <4FD74E6F.1070001@behnel.de>

Hi,

after the merge of the "_fused_dispatch_rebased" branch, I get C compile
errors in a simple fused types example:

"""
from cython cimport integral

# define a fused type for different containers
ctypedef fused container:
    list
    tuple
    object

# define a generic function using the above types
cpdef sum(container items, integral start = 0):
    cdef integral item, result
    result = start
    for item in items:
        result += item
    return result

def test():
    cdef int x = 1, y = 2

    # call [list,int] specialisation implicitly
    print( sum([1,2,3,4], x) )

    # calls [object,long] specialisation explicitly
    print( sum[object,long]([1,2,3,4], y) )
"""

The C compiler complains that "__pyx_dynamic_args" is undeclared -
supposedly something should have been passed into the function but wasn't.

Mark, could you take a look?

Stefan

From d.s.seljebotn at astro.uio.no  Tue Jun 12 21:46:22 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 12 Jun 2012 21:46:22 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no> <4FD77AAC.6080905@astro.uio.no>
	<CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
Message-ID: <4FD79C8E.9030009@astro.uio.no>

On 06/12/2012 08:12 PM, Robert Bradshaw wrote:
> On Tue, Jun 12, 2012 at 10:21 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 06/10/2012 11:53 AM, Robert Bradshaw wrote:
>>>>
>>>> On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn
>>>>>
>>>>> About signatures, a problem I see with following the C typing is that
>>>>> the
>>>>> signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and
>>>>> "iqq"
>>>>> on 32-bit Linux, and so on. I think that would be really bad.
>>>>
>>>>
>>>> This is why I suggested promotion for scalars (divide ints into
>>>> <=sizeof(long) and sizeof(long)<  x<= sizeof(long long)), checked at
>>>> C compile time, though I guess you consider that evil. I don't
>>>> consider not matching really bad, just kind of bad.
>>>
>>>
>>> Right. At least a convention for promotion of scalars would be good
>>> anyway.
>>>
>>> Even MSVC supports stdint.h these days; basing ourselves on the random
>>> behaviour of "long" seems a bit outdated to me. "ssize_t" would be
>>> better motivated I feel.
>>>
>>> Many linear algebra libraries use 32-bit matrix indices by default, but
>>> can be swapped to 64-bit indices (this holds for many LAPACK
>>> implementations and most sparse linear algebra). So often there will at
>>> least be one typedef that is either 32 bits or 64 bits without the
>>> Cython compiler knowing.
>>>
>>> Promoting to a single type "[u]int64" is the only one that removes
>>> possible combinatorial explosion if you have multiple external typedefs
>>> that you don't know the size of (although I guess that's rather
>>> theoretical).
>>>
>>> Anyway, runtime table generation is quite fast, see below.
>>>
>>>>
>>>>> "l" must be banished -- but then one might as well do "i4i8i8".
>>>>>
>>>>> Designing a signature hash where you select between these at
>>>>> compile-time is
>>>>> perhaps doable but does generate a lot of code and makes everything
>>>>> complicated.
>>>>
>>>>
>>>> It especially gets messy when you're trying to pre-compute tables.
>>>>
>>>>> I think we should just start off with hashing at module load
>>>>> time when sizes are known, and then work with heuristics and/or build
>>>>> system
>>>>> integration to improve on that afterwards.
>>>>
>>>>
>>>> Finding 10,000 optimal tables at runtime better be really cheap than
>>>> for Sage's sake :).
>>>
>>>
>>> The code is highly unpolished as I write this, but it works so here's
>>> some preliminary benchmarks.
>>>
>>> Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot
>>> table varies between 5 and 10 us (microseconds) depending on the set of
>>> hashes.
>>>
>>> Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns
>>> per hash, so 26 us for the 64-slot table =>  it dominates!
>>>
>>> The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table.
>>> Admittedly, that's with hand-written non-portable assembly for the
>>> crapwow64.
>>>
>>> Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4
>>> seconds for loading Sage using md5, or 0.1 seconds using crapwow64.
>>>
>>>
>>> https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h
>>>
>>> http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html
>>
>>
>> Look: A big advantage of the hash-vtables is that subclasses stay
>> ABI-compatible with superclasses, and don't need recompilation when
>> superclasses adds or removes methods.
>>
>> =>  Finding the hash table must happen at run-time in a lot of cases anyway,
>> so I feel Robert's chase for a compile-time table building is moot.
>>
>> I feel this would also need to trigger automatically heap-allocated tables
>> if the statically allocated. Which is good to have in the very few cases
>> where a perfect table can't be found too.
>
> Finding the hash table at runtime should be supported, but the *vast*
> majority of methods sets is known at compile time. 0.4 seconds is a
> huge overhead to just add to Sage (yes, it's an exception, but an
> important one), and though crapwow64 helps I'd rather rely on a known,
> good standard hash. I need to actually look at Sage to see what the
> impact would be. Also, most tables would probably have 2 entries in
> them (e.g. a typed one and an all-object one).

Hopefully 0.4 was a severe overestimate once one actually looks at this.

What's loaded at startup -- is it the pyx files in sage/devel/sage? My 
count (just cloned from github.com/sagemath/sage):

$ find -name '*.pyx' -exec grep 'cdef class' {} \; | wc -l
641

And I doubt that *all* of that is loaded at Sage startup, you need to do 
some manual importing for at least some of those classes? So it's 
probably closer to 0.01-0.02 seconds than 0.4 even with md5?

About the *vast* majority of method sets being known: That may be the 
case for old code, but keep in mind that that situation might 
deteriorate. Once we have hash-based vtables, declaring methods of cdef 
classes in pxd files could become optional (and only be there to help 
callers, incl. subclasses, determine the signature). So any method 
that's only used in the superclass and is therefore not declared in the 
pxd file would consistently trigger a run-time build of the table of 
subclasses; the compile-time generated table would be useless then.

(OTOH, as duck-typing becomes the norm, more cdef classes will be 
without superclasses...)

> long int will continue to be an important type as long as it's the
> default for int literals and Python's "fast" ints (whether in type or
> implementation), so we can't just move to stdint. I also don't like
> that the form of the table (and whether certain signatures match)
> being platform-dependent: the less variance we have from one platform
> to the next is better.

Perhaps in Sage there's a lot of use of "long" and therefore this would 
make Sage code vary less between platforms.

But for much NumPy-using code you'd typically use int32 or int64, and 
since long is 32 bits on 32-bit Windows and 64 bits on Linux/Mac, 
choosing long sort of maximises inter-platform variation of signatures...

> On an orthogonal note, sizeof(long)-sensitive tables need not be
> entirely at odds with compile-time table compilation, as most
> functions will probably have 0 or 1 parameters that are of unknown
> size, so we could spit out 1 or 2 statically compiled tables and do
> generate the rest on the fly. I still would rather have fixed
> Cython-compile time tables though.

Well, I'd "rather have" that as well if it worked every time.

But there's no use designing a feature which works great unless you use 
the fftw_complex type (can be 64 or 128 bits). Or works great unless you 
use 64-bit LAPACK. Or works great unless you have a superclass with a 
partially defined pxd file.

Since one implementation of a concept is simpler than two, then as long 
as run-time generation code must always be there (or at least, be there 
in the common cases x, y, and z), the reasons should be very good for 
adding a compile-time implementation.

Sage taking 0.4 seconds extra would indeed be a very good reason, but I 
don't believe it. So when you can get around to it it'd be great to have 
the actual number of classes (and ideally an estimate for number of 
methods per class).

Dag

From d.s.seljebotn at astro.uio.no  Tue Jun 12 22:00:35 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 12 Jun 2012 22:00:35 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD79C8E.9030009@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no> <4FD77AAC.6080905@astro.uio.no>
	<CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
	<4FD79C8E.9030009@astro.uio.no>
Message-ID: <4FD79FE3.4000102@astro.uio.no>

On 06/12/2012 09:46 PM, Dag Sverre Seljebotn wrote:
> On 06/12/2012 08:12 PM, Robert Bradshaw wrote:
>> On Tue, Jun 12, 2012 at 10:21 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 06/10/2012 11:53 AM, Robert Bradshaw wrote:
>>>>>
>>>>> On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn
>>>>>>
>>>>>> About signatures, a problem I see with following the C typing is that
>>>>>> the
>>>>>> signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and
>>>>>> "iqq"
>>>>>> on 32-bit Linux, and so on. I think that would be really bad.
>>>>>
>>>>>
>>>>> This is why I suggested promotion for scalars (divide ints into
>>>>> <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at
>>>>> C compile time, though I guess you consider that evil. I don't
>>>>> consider not matching really bad, just kind of bad.
>>>>
>>>>
>>>> Right. At least a convention for promotion of scalars would be good
>>>> anyway.
>>>>
>>>> Even MSVC supports stdint.h these days; basing ourselves on the random
>>>> behaviour of "long" seems a bit outdated to me. "ssize_t" would be
>>>> better motivated I feel.
>>>>
>>>> Many linear algebra libraries use 32-bit matrix indices by default, but
>>>> can be swapped to 64-bit indices (this holds for many LAPACK
>>>> implementations and most sparse linear algebra). So often there will at
>>>> least be one typedef that is either 32 bits or 64 bits without the
>>>> Cython compiler knowing.
>>>>
>>>> Promoting to a single type "[u]int64" is the only one that removes
>>>> possible combinatorial explosion if you have multiple external typedefs
>>>> that you don't know the size of (although I guess that's rather
>>>> theoretical).
>>>>
>>>> Anyway, runtime table generation is quite fast, see below.
>>>>
>>>>>
>>>>>> "l" must be banished -- but then one might as well do "i4i8i8".
>>>>>>
>>>>>> Designing a signature hash where you select between these at
>>>>>> compile-time is
>>>>>> perhaps doable but does generate a lot of code and makes everything
>>>>>> complicated.
>>>>>
>>>>>
>>>>> It especially gets messy when you're trying to pre-compute tables.
>>>>>
>>>>>> I think we should just start off with hashing at module load
>>>>>> time when sizes are known, and then work with heuristics and/or build
>>>>>> system
>>>>>> integration to improve on that afterwards.
>>>>>
>>>>>
>>>>> Finding 10,000 optimal tables at runtime better be really cheap than
>>>>> for Sage's sake :).
>>>>
>>>>
>>>> The code is highly unpolished as I write this, but it works so here's
>>>> some preliminary benchmarks.
>>>>
>>>> Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot
>>>> table varies between 5 and 10 us (microseconds) depending on the set of
>>>> hashes.
>>>>
>>>> Computing md5's with C code from ulib (not hashlib/OpenSSL) takes
>>>> ~400ns
>>>> per hash, so 26 us for the 64-slot table => it dominates!
>>>>
>>>> The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table.
>>>> Admittedly, that's with hand-written non-portable assembly for the
>>>> crapwow64.
>>>>
>>>> Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4
>>>> seconds for loading Sage using md5, or 0.1 seconds using crapwow64.
>>>>
>>>>
>>>> https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h
>>>>
>>>>
>>>> http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html
>>>
>>>
>>> Look: A big advantage of the hash-vtables is that subclasses stay
>>> ABI-compatible with superclasses, and don't need recompilation when
>>> superclasses adds or removes methods.
>>>
>>> => Finding the hash table must happen at run-time in a lot of cases
>>> anyway,
>>> so I feel Robert's chase for a compile-time table building is moot.
>>>
>>> I feel this would also need to trigger automatically heap-allocated
>>> tables
>>> if the statically allocated. Which is good to have in the very few cases
>>> where a perfect table can't be found too.
>>
>> Finding the hash table at runtime should be supported, but the *vast*
>> majority of methods sets is known at compile time. 0.4 seconds is a
>> huge overhead to just add to Sage (yes, it's an exception, but an
>> important one), and though crapwow64 helps I'd rather rely on a known,
>> good standard hash. I need to actually look at Sage to see what the
>> impact would be. Also, most tables would probably have 2 entries in
>> them (e.g. a typed one and an all-object one).
>
> Hopefully 0.4 was a severe overestimate once one actually looks at this.
>
> What's loaded at startup -- is it the pyx files in sage/devel/sage? My
> count (just cloned from github.com/sagemath/sage):
>
> $ find -name '*.pyx' -exec grep 'cdef class' {} \; | wc -l
> 641
>
> And I doubt that *all* of that is loaded at Sage startup, you need to do
> some manual importing for at least some of those classes? So it's
> probably closer to 0.01-0.02 seconds than 0.4 even with md5?
>
> About the *vast* majority of method sets being known: That may be the
> case for old code, but keep in mind that that situation might
> deteriorate. Once we have hash-based vtables, declaring methods of cdef
> classes in pxd files could become optional (and only be there to help
> callers, incl. subclasses, determine the signature). So any method
> that's only used in the superclass and is therefore not declared in the
> pxd file would consistently trigger a run-time build of the table of
> subclasses; the compile-time generated table would be useless then.
>
> (OTOH, as duck-typing becomes the norm, more cdef classes will be
> without superclasses...)
>
>> long int will continue to be an important type as long as it's the
>> default for int literals and Python's "fast" ints (whether in type or
>> implementation), so we can't just move to stdint. I also don't like
>> that the form of the table (and whether certain signatures match)
>> being platform-dependent: the less variance we have from one platform
>> to the next is better.
>
> Perhaps in Sage there's a lot of use of "long" and therefore this would
> make Sage code vary less between platforms.
>
> But for much NumPy-using code you'd typically use int32 or int64, and
> since long is 32 bits on 32-bit Windows and 64 bits on Linux/Mac,
> choosing long sort of maximises inter-platform variation of signatures...
>

Also, promotion can't be used for pointers, buffers, ndarray dtypes...

I don't mind heuristics that work in 99.9% of the cases. Heuristics that 
work in 80% of the cases seem more like a time drain though.

But if there's indeed a problem with Sage load times, and a particular 
set of heuristics allows us to overcome what is otherwise a blocker for 
attaching these tables to cdef classes, then sure.

Dag

>> On an orthogonal note, sizeof(long)-sensitive tables need not be
>> entirely at odds with compile-time table compilation, as most
>> functions will probably have 0 or 1 parameters that are of unknown
>> size, so we could spit out 1 or 2 statically compiled tables and do
>> generate the rest on the fly. I still would rather have fixed
>> Cython-compile time tables though.
>
> Well, I'd "rather have" that as well if it worked every time.
>
> But there's no use designing a feature which works great unless you use
> the fftw_complex type (can be 64 or 128 bits). Or works great unless you
> use 64-bit LAPACK. Or works great unless you have a superclass with a
> partially defined pxd file.
>
> Since one implementation of a concept is simpler than two, then as long
> as run-time generation code must always be there (or at least, be there
> in the common cases x, y, and z), the reasons should be very good for
> adding a compile-time implementation.
>
> Sage taking 0.4 seconds extra would indeed be a very good reason, but I
> don't believe it. So when you can get around to it it'd be great to have
> the actual number of classes (and ideally an estimate for number of
> methods per class).
>
> Dag
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


From markflorisson88 at gmail.com  Wed Jun 13 17:26:05 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 13 Jun 2012 16:26:05 +0100
Subject: [Cython] "__pyx_dynamic_args" undeclared in fused types code
In-Reply-To: <4FD74E6F.1070001@behnel.de>
References: <4FD74E6F.1070001@behnel.de>
Message-ID: <CANg26EXMF-W6yrbTPPfyttg3gWErv=2+O_e+DNwm3=nv=TOb6A@mail.gmail.com>

On Jun 12, 2012 8:15 PM, "Stefan Behnel" <stefan_ml at behnel.de> wrote:
>
> Hi,
>
> after the merge of the "_fused_dispatch_rebased" branch, I get C compile
> errors in a simple fused types example:
>
> """
> from cython cimport integral
>
> # define a fused type for different containers
> ctypedef fused container:
>    list
>    tuple
>    object
>
> # define a generic function using the above types
> cpdef sum(container items, integral start = 0):
>    cdef integral item, result
>    result = start
>    for item in items:
>        result += item
>    return result
>
> def test():
>    cdef int x = 1, y = 2
>
>    # call [list,int] specialisation implicitly
>    print( sum([1,2,3,4], x) )
>
>    # calls [object,long] specialisation explicitly
>    print( sum[object,long]([1,2,3,4], y) )
> """
>
> The C compiler complains that "__pyx_dynamic_args" is undeclared -
> supposedly something should have been passed into the function but wasn't.
>
> Mark, could you take a look?
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Thanks for pointing that out Stefan, I'll get that fixed for 0.17.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20120613/967dfbd4/attachment.html>

From stefan_ml at behnel.de  Mon Jun 18 16:12:08 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 18 Jun 2012 16:12:08 +0200
Subject: [Cython] new FFI library for Python
Message-ID: <4FDF3738.9040006@behnel.de>

Hi,

the PyPy folks have come up with a new FFI library (called cffi) for
CPython (and eventually PyPy, obviously).

http://cffi.readthedocs.org/

It borrows from LuaJIT's FFI in that it parses C declarations at runtime.
It then builds a C extension to access the external code, i.e. it requires
a C compiler at runtime (when running in CPython).

Just thought this might be interesting.

Stefan

From redbrain at gcc.gnu.org  Mon Jun 18 17:26:09 2012
From: redbrain at gcc.gnu.org (Philip Herron)
Date: Mon, 18 Jun 2012 16:26:09 +0100
Subject: [Cython] new FFI library for Python
In-Reply-To: <4FDF3738.9040006@behnel.de>
References: <4FDF3738.9040006@behnel.de>
Message-ID: <CAEvRbep2kcU=MmAhZvZmdm_4Wj6mfc_XeGzVyt6yodf+WcrzpQ@mail.gmail.com>

On 18 June 2012 15:12, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> the PyPy folks have come up with a new FFI library (called cffi) for
> CPython (and eventually PyPy, obviously).
>
> http://cffi.readthedocs.org/
>
> It borrows from LuaJIT's FFI in that it parses C declarations at runtime.
> It then builds a C extension to access the external code, i.e. it requires
> a C compiler at runtime (when running in CPython).
>
> Just thought this might be interesting.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

I have been using libffi in my gccpy runtime wonder why they decided
to make a new one and not use libffi

--Phil

From stefan_ml at behnel.de  Mon Jun 18 18:39:19 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 18 Jun 2012 18:39:19 +0200
Subject: [Cython] new FFI library for Python
In-Reply-To: <CAEvRbep2kcU=MmAhZvZmdm_4Wj6mfc_XeGzVyt6yodf+WcrzpQ@mail.gmail.com>
References: <4FDF3738.9040006@behnel.de>
	<CAEvRbep2kcU=MmAhZvZmdm_4Wj6mfc_XeGzVyt6yodf+WcrzpQ@mail.gmail.com>
Message-ID: <4FDF59B7.4030509@behnel.de>

Philip Herron, 18.06.2012 17:26:
> On 18 June 2012 15:12, Stefan Behnel wrote:
>> the PyPy folks have come up with a new FFI library (called cffi) for
>> CPython (and eventually PyPy, obviously).
>>
>> http://cffi.readthedocs.org/
>>
>> It borrows from LuaJIT's FFI in that it parses C declarations at runtime.
>> It then builds a C extension to access the external code, i.e. it requires
>> a C compiler at runtime (when running in CPython).
>>
>> Just thought this might be interesting.
> 
> I have been using libffi in my gccpy runtime wonder why they decided
> to make a new one and not use libffi

Isn't libffi RPython? That's enough of a reason, I'd say.

Stefan

From stefan_ml at behnel.de  Mon Jun 18 21:46:37 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 18 Jun 2012 21:46:37 +0200
Subject: [Cython] new FFI library for Python
In-Reply-To: <4FDF3738.9040006@behnel.de>
References: <4FDF3738.9040006@behnel.de>
Message-ID: <4FDF859D.2090008@behnel.de>

Stefan Behnel, 18.06.2012 16:12:
> the PyPy folks have come up with a new FFI library (called cffi) for
> CPython (and eventually PyPy, obviously).
> 
> http://cffi.readthedocs.org/
> 
> It borrows from LuaJIT's FFI in that it parses C declarations at runtime.
> It then builds a C extension to access the external code, i.e. it requires
> a C compiler at runtime (when running in CPython).
> 
> Just thought this might be interesting.

The code is here, BTW:

https://bitbucket.org/cffi/cffi/

One interesting feature is that they seem to support different backends.
There's apparently one for libffi and one for ctypes so far. Another one
based on Cython would be cool. Even the existing ffi backend implementation
would have looked better in Cython, it's currently some 3000 lines of C
code. And Cython could certainly benefit from an ffi backend itself for a
couple of tasks, this topic has come up before a couple of times.

Stefan

From sturla at molden.no  Tue Jun 19 13:25:02 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 19 Jun 2012 13:25:02 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FD79C8E.9030009@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no> <4FD77AAC.6080905@astro.uio.no>
	<CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
	<4FD79C8E.9030009@astro.uio.no>
Message-ID: <4FE0618E.5020009@molden.no>

On 12.06.2012 21:46, Dag Sverre Seljebotn wrote:

> But for much NumPy-using code you'd typically use int32 or int64, and
> since long is 32 bits on 32-bit Windows and 64 bits on Linux/Mac,
> choosing long sort of maximises inter-platform variation of signatures...

The size of a long is compiler dependent, not OS dependent.

Most C compilers for Windows use 32 bit long, also on 64-bit Windows for 
AMD64. The reason is that the AMD64 architecture natively uses a "64-bit 
pointer with a 32-bit offset". So indexing with a 64-bit offset could 
incur some extra overhead. (I don't know how much, if any at all.)

On IA64 the C compilers for Windows use 64 bit long, because the native 
offset size is 64 bit.

The C standard specify that a long is "at least 32 bits". Any code that 
assumes a specific sizeof(long), or that a long is 64-bits, does not 
follow the C standard.

Sturla

From sturla at molden.no  Tue Jun 19 13:58:53 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 19 Jun 2012 13:58:53 +0200
Subject: [Cython] new FFI library for Python
In-Reply-To: <4FDF3738.9040006@behnel.de>
References: <4FDF3738.9040006@behnel.de>
Message-ID: <4FE0697D.3020306@molden.no>

On 18.06.2012 16:12, Stefan Behnel wrote:

> the PyPy folks have come up with a new FFI library (called cffi) for
> CPython (and eventually PyPy, obviously).

It looks like ctypes albeit with a smaller API. (C definitions as text 
strings instead of Python objects.)

Sometimes I think Python and a ffi would always suffice. But in practice 
Cython's __dealloc__ can be indispensible, as opposed to a Python 
__del__ method which can be unreliable. And Python's module loader 
mostly takes care of the common problem of DLL hell.

With a ffi like ctypes or cffi, we don't have the RAII-like cleanup that 
__dealloc__ provides, and loading the DLLs suffer from all the nastyness 
of DLL hell.


Sturla


From robertwb at gmail.com  Tue Jun 19 21:01:55 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Tue, 19 Jun 2012 12:01:55 -0700
Subject: [Cython] new FFI library for Python
In-Reply-To: <4FE0697D.3020306@molden.no>
References: <4FDF3738.9040006@behnel.de> <4FE0697D.3020306@molden.no>
Message-ID: <CADiQ+QCuN7eAus=6OabYHXXYSPnj-iu_u_+KAB4-_eg33roa+g@mail.gmail.com>

On Tue, Jun 19, 2012 at 4:58 AM, Sturla Molden <sturla at molden.no> wrote:
> On 18.06.2012 16:12, Stefan Behnel wrote:
>
>> the PyPy folks have come up with a new FFI library (called cffi) for
>> CPython (and eventually PyPy, obviously).
>
>
> It looks like ctypes albeit with a smaller API. (C definitions as text
> strings instead of Python objects.)
>
> Sometimes I think Python and a ffi would always suffice. But in practice
> Cython's __dealloc__ can be indispensible, as opposed to a Python __del__
> method which can be unreliable. And Python's module loader mostly takes care
> of the common problem of DLL hell.

This also assumes you're always able and willing to use/write a
library written in a lower-level language like C or Fortran to
actually do your heavy lifting. Cython (ideally) allows you to write
your actual number-crunching code without learning an (entirely) new
language.

- Robert

From stefan_ml at behnel.de  Thu Jun 21 10:59:39 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Jun 2012 10:59:39 +0200
Subject: [Cython] buffer shape incompatible with memoryview shape
Message-ID: <4FE2E27B.8010102@behnel.de>

Hi,

I find this worth fixing for 0.17:

http://trac.cython.org/cython_trac/ticket/780

Stefan

From markflorisson88 at gmail.com  Thu Jun 21 12:38:11 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 21 Jun 2012 11:38:11 +0100
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE2E27B.8010102@behnel.de>
References: <4FE2E27B.8010102@behnel.de>
Message-ID: <CANg26EWV-=axmbT+sD1u53mOxzE-cfFGdFBx=Z282o8J-Wz-Rg@mail.gmail.com>

On 21 June 2012 09:59, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> I find this worth fixing for 0.17:
>
> http://trac.cython.org/cython_trac/ticket/780
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

It seems that arrays are compared as pointer values, so it doesn't
even compare sensibly anyway. You can easily work around it by writing
(<object> memoryview).shape though. I think these
shape/strides/suboffset arrays should have a special type and coerce
to tuples when coercing to an object. Feel free to work on that, it
wouldn't really require touching much or any of the memoryview code,
it's not really on m priority list right now.

BTW, Stefan, how do we start Jenkins on the sage server? It's been
down for weeks now.

From stefan_ml at behnel.de  Thu Jun 21 13:00:11 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Jun 2012 13:00:11 +0200
Subject: [Cython] Jenkins status
In-Reply-To: <CANg26EWV-=axmbT+sD1u53mOxzE-cfFGdFBx=Z282o8J-Wz-Rg@mail.gmail.com>
References: <4FE2E27B.8010102@behnel.de>
	<CANg26EWV-=axmbT+sD1u53mOxzE-cfFGdFBx=Z282o8J-Wz-Rg@mail.gmail.com>
Message-ID: <4FE2FEBB.8050506@behnel.de>

mark florisson, 21.06.2012 12:38:
> BTW, Stefan, how do we start Jenkins on the sage server? It's been
> down for weeks now.

It seems like the sage.math server would be happy about a restart. I'll
trigger the ML.

Stefan

From d.s.seljebotn at astro.uio.no  Thu Jun 21 13:10:05 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 21 Jun 2012 13:10:05 +0200
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE2E27B.8010102@behnel.de>
References: <4FE2E27B.8010102@behnel.de>
Message-ID: <4FE3010D.4060700@astro.uio.no>

On 06/21/2012 10:59 AM, Stefan Behnel wrote:
> Hi,
>
> I find this worth fixing for 0.17:
>
> http://trac.cython.org/cython_trac/ticket/780
>

I'm not sure about the timeline here.

The object<->memoryview semantics haven't even been hammered down yet; 
does "mview.customattr" trigger an AttributeError, SyntaxError or fall 
back to some underlying object (constructing it if necesarry).

Until that happens, memoryviews are an experimental feature and present 
for development purposes mostly, so it's not like this is a big bug that 
would bite end-users. Thinking about those semantics is much more 
important...

Dag

From stefan_ml at behnel.de  Thu Jun 21 13:36:01 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Jun 2012 13:36:01 +0200
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE3010D.4060700@astro.uio.no>
References: <4FE2E27B.8010102@behnel.de> <4FE3010D.4060700@astro.uio.no>
Message-ID: <4FE30721.2070902@behnel.de>

Dag Sverre Seljebotn, 21.06.2012 13:10:
> On 06/21/2012 10:59 AM, Stefan Behnel wrote:
>> I find this worth fixing for 0.17:
>>
>> http://trac.cython.org/cython_trac/ticket/780
> 
> I'm not sure about the timeline here.
> 
> The object<->memoryview semantics haven't even been hammered down yet; does
> "mview.customattr" trigger an AttributeError, SyntaxError or fall back to
> some underlying object (constructing it if necesarry).
> 
> Until that happens, memoryviews are an experimental feature and present for
> development purposes mostly, so it's not like this is a big bug that would
> bite end-users. Thinking about those semantics is much more important...

Absolutely.

I ran into this when I gave a Cython+NumPy course and this was the first
thing that the attendants tried when I asked them to validate that two
input arrays have the same size before adding them. It's the one obvious
way to do it, and it fails miserably. I think it should be fixed, and I
think it should be fixed soon because it feels really low-level and
complicated, especially to new users.

Stefan

From d.s.seljebotn at astro.uio.no  Thu Jun 21 14:05:30 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 21 Jun 2012 14:05:30 +0200
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE30721.2070902@behnel.de>
References: <4FE2E27B.8010102@behnel.de> <4FE3010D.4060700@astro.uio.no>
	<4FE30721.2070902@behnel.de>
Message-ID: <4FE30E0A.8020003@astro.uio.no>

On 06/21/2012 01:36 PM, Stefan Behnel wrote:
> Dag Sverre Seljebotn, 21.06.2012 13:10:
>> On 06/21/2012 10:59 AM, Stefan Behnel wrote:
>>> I find this worth fixing for 0.17:
>>>
>>> http://trac.cython.org/cython_trac/ticket/780
>>
>> I'm not sure about the timeline here.
>>
>> The object<->memoryview semantics haven't even been hammered down yet; does
>> "mview.customattr" trigger an AttributeError, SyntaxError or fall back to
>> some underlying object (constructing it if necesarry).
>>
>> Until that happens, memoryviews are an experimental feature and present for
>> development purposes mostly, so it's not like this is a big bug that would
>> bite end-users. Thinking about those semantics is much more important...
>
> Absolutely.
>
> I ran into this when I gave a Cython+NumPy course and this was the first
> thing that the attendants tried when I asked them to validate that two
> input arrays have the same size before adding them. It's the one obvious
> way to do it, and it fails miserably. I think it should be fixed, and I
> think it should be fixed soon because it feels really low-level and
> complicated, especially to new users.

Can you clarify a bit -- did you give this course using 
np.ndarray[double, ndim=2], or double[:, :]? They're really very 
separate under the hood and the fix is different.

Or, did you actually use object[double, ndim=2] like in the bug report? 
(Did me and Mark get around to propose deprecating this one on the list?)

Dag

From stefan_ml at behnel.de  Thu Jun 21 14:59:19 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Jun 2012 14:59:19 +0200
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE30E0A.8020003@astro.uio.no>
References: <4FE2E27B.8010102@behnel.de> <4FE3010D.4060700@astro.uio.no>
	<4FE30721.2070902@behnel.de> <4FE30E0A.8020003@astro.uio.no>
Message-ID: <4FE31AA7.5040101@behnel.de>

Dag Sverre Seljebotn, 21.06.2012 14:05:
> On 06/21/2012 01:36 PM, Stefan Behnel wrote:
>>> On 06/21/2012 10:59 AM, Stefan Behnel wrote:
>>>> I find this worth fixing for 0.17:
>>>>
>>>> http://trac.cython.org/cython_trac/ticket/780
>>>
>> I ran into this when I gave a Cython+NumPy course and this was the first
>> thing that the attendants tried when I asked them to validate that two
>> input arrays have the same size before adding them. It's the one obvious
>> way to do it, and it fails miserably. I think it should be fixed, and I
>> think it should be fixed soon because it feels really low-level and
>> complicated, especially to new users.
> 
> Can you clarify a bit -- did you give this course using np.ndarray[double,
> ndim=2], or double[:, :]? They're really very separate under the hood and
> the fix is different.
> 
> Or, did you actually use object[double, ndim=2] like in the bug report?
> (Did me and Mark get around to propose deprecating this one on the list?)

IIRC, we used object[double, ndim=2] for both and I also tried it with a
memory view as in the bug report. I thought that using "object" was the
preferred way to do it? At least, it doesn't restrict the type of the
buffer exporter, which I consider a good thing.

Stefan

From d.s.seljebotn at astro.uio.no  Thu Jun 21 15:06:57 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 21 Jun 2012 15:06:57 +0200
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE31AA7.5040101@behnel.de>
References: <4FE2E27B.8010102@behnel.de> <4FE3010D.4060700@astro.uio.no>
	<4FE30721.2070902@behnel.de> <4FE30E0A.8020003@astro.uio.no>
	<4FE31AA7.5040101@behnel.de>
Message-ID: <4FE31C71.8060904@astro.uio.no>

On 06/21/2012 02:59 PM, Stefan Behnel wrote:
> Dag Sverre Seljebotn, 21.06.2012 14:05:
>> On 06/21/2012 01:36 PM, Stefan Behnel wrote:
>>>> On 06/21/2012 10:59 AM, Stefan Behnel wrote:
>>>>> I find this worth fixing for 0.17:
>>>>>
>>>>> http://trac.cython.org/cython_trac/ticket/780
>>>>
>>> I ran into this when I gave a Cython+NumPy course and this was the first
>>> thing that the attendants tried when I asked them to validate that two
>>> input arrays have the same size before adding them. It's the one obvious
>>> way to do it, and it fails miserably. I think it should be fixed, and I
>>> think it should be fixed soon because it feels really low-level and
>>> complicated, especially to new users.
>>
>> Can you clarify a bit -- did you give this course using np.ndarray[double,
>> ndim=2], or double[:, :]? They're really very separate under the hood and
>> the fix is different.
>>
>> Or, did you actually use object[double, ndim=2] like in the bug report?
>> (Did me and Mark get around to propose deprecating this one on the list?)
>
> IIRC, we used object[double, ndim=2] for both and I also tried it with a
> memory view as in the bug report. I thought that using "object" was the
> preferred way to do it? At least, it doesn't restrict the type of the
> buffer exporter, which I consider a good thing.

That's a very theoretical argument as NumPy arrays are in practice the 
only exporter.

I always teach np.ndarray[double...]. I've never told anyone about 
object[...], I don't think it's in much use. For starters it's going to 
be horribly inefficient unless you also add "mode='strided'" within the 
brackets.

My proposal (and Mark's I think) is:

Since the memoryviews will neatly cover the general exporter case, and 
since the [] syntax is much overloaded already (used for C++ templates 
too), we should deprecate object[...] no matter what else happens.

Depending on what's decided for np.ndarray[...], we have:

Case A): Deprecate both np.ndarray[...] and object[...]

Case B): Only deprecate object[...], keep np.ndarray[...] (e.g., through 
a decorator used in numpy.pxd on the ndarray type). So rather than 
having a trailing [] mean buffers unless it means something else (like 
C++ templates), we instead make np.ndarray a "template", through special 
compiler support.


Dag

From stefan_ml at behnel.de  Thu Jun 21 15:34:27 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Jun 2012 15:34:27 +0200
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE31C71.8060904@astro.uio.no>
References: <4FE2E27B.8010102@behnel.de> <4FE3010D.4060700@astro.uio.no>
	<4FE30721.2070902@behnel.de> <4FE30E0A.8020003@astro.uio.no>
	<4FE31AA7.5040101@behnel.de> <4FE31C71.8060904@astro.uio.no>
Message-ID: <4FE322E3.40706@behnel.de>

Dag Sverre Seljebotn, 21.06.2012 15:06:
> On 06/21/2012 02:59 PM, Stefan Behnel wrote:
>> Dag Sverre Seljebotn, 21.06.2012 14:05:
>>> On 06/21/2012 01:36 PM, Stefan Behnel wrote:
>>>>> On 06/21/2012 10:59 AM, Stefan Behnel wrote:
>>>>>> I find this worth fixing for 0.17:
>>>>>>
>>>>>> http://trac.cython.org/cython_trac/ticket/780
>>>>>
>>>> I ran into this when I gave a Cython+NumPy course and this was the first
>>>> thing that the attendants tried when I asked them to validate that two
>>>> input arrays have the same size before adding them. It's the one obvious
>>>> way to do it, and it fails miserably. I think it should be fixed, and I
>>>> think it should be fixed soon because it feels really low-level and
>>>> complicated, especially to new users.
>>>
>>> Can you clarify a bit -- did you give this course using np.ndarray[double,
>>> ndim=2], or double[:, :]? They're really very separate under the hood and
>>> the fix is different.
>>>
>>> Or, did you actually use object[double, ndim=2] like in the bug report?
>>> (Did me and Mark get around to propose deprecating this one on the list?)
>>
>> IIRC, we used object[double, ndim=2] for both and I also tried it with a
>> memory view as in the bug report. I thought that using "object" was the
>> preferred way to do it? At least, it doesn't restrict the type of the
>> buffer exporter, which I consider a good thing.
> 
> That's a very theoretical argument as NumPy arrays are in practice the only
> exporter.

Except for, say, bytes objects, array.array and user implemented types,
that is. lxml has buffer support for its serialised XSLT output, for example.


> I always teach np.ndarray[double...]. I've never told anyone about
> object[...], I don't think it's in much use. For starters it's going to be
> horribly inefficient unless you also add "mode='strided'" within the brackets.

Ah, good to know.


> My proposal (and Mark's I think) is:
> 
> Since the memoryviews will neatly cover the general exporter case, and
> since the [] syntax is much overloaded already (used for C++ templates
> too), we should deprecate object[...] no matter what else happens.
> 
> Depending on what's decided for np.ndarray[...], we have:
> 
> Case A): Deprecate both np.ndarray[...] and object[...]
> 
> Case B): Only deprecate object[...], keep np.ndarray[...] (e.g., through a
> decorator used in numpy.pxd on the ndarray type). So rather than having a
> trailing [] mean buffers unless it means something else (like C++
> templates), we instead make np.ndarray a "template", through special
> compiler support.

What's the point in technically deprecating them if we can't remove them
without breaking code? Wouldn't it be better to deprecate them only in the
docs?

Stefan

From markflorisson88 at gmail.com  Thu Jun 21 16:24:11 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 21 Jun 2012 15:24:11 +0100
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE322E3.40706@behnel.de>
References: <4FE2E27B.8010102@behnel.de> <4FE3010D.4060700@astro.uio.no>
	<4FE30721.2070902@behnel.de> <4FE30E0A.8020003@astro.uio.no>
	<4FE31AA7.5040101@behnel.de> <4FE31C71.8060904@astro.uio.no>
	<4FE322E3.40706@behnel.de>
Message-ID: <CANg26EUV95kOjAH2P=A5-kA874GdchTDQXS05grsYEZqnpTOww@mail.gmail.com>

On 21 June 2012 14:34, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Dag Sverre Seljebotn, 21.06.2012 15:06:
>> On 06/21/2012 02:59 PM, Stefan Behnel wrote:
>>> Dag Sverre Seljebotn, 21.06.2012 14:05:
>>>> On 06/21/2012 01:36 PM, Stefan Behnel wrote:
>>>>>> On 06/21/2012 10:59 AM, Stefan Behnel wrote:
>>>>>>> I find this worth fixing for 0.17:
>>>>>>>
>>>>>>> http://trac.cython.org/cython_trac/ticket/780
>>>>>>
>>>>> I ran into this when I gave a Cython+NumPy course and this was the first
>>>>> thing that the attendants tried when I asked them to validate that two
>>>>> input arrays have the same size before adding them. It's the one obvious
>>>>> way to do it, and it fails miserably. I think it should be fixed, and I
>>>>> think it should be fixed soon because it feels really low-level and
>>>>> complicated, especially to new users.
>>>>
>>>> Can you clarify a bit -- did you give this course using np.ndarray[double,
>>>> ndim=2], or double[:, :]? They're really very separate under the hood and
>>>> the fix is different.
>>>>
>>>> Or, did you actually use object[double, ndim=2] like in the bug report?
>>>> (Did me and Mark get around to propose deprecating this one on the list?)
>>>
>>> IIRC, we used object[double, ndim=2] for both and I also tried it with a
>>> memory view as in the bug report. I thought that using "object" was the
>>> preferred way to do it? At least, it doesn't restrict the type of the
>>> buffer exporter, which I consider a good thing.
>>
>> That's a very theoretical argument as NumPy arrays are in practice the only
>> exporter.
>
> Except for, say, bytes objects, array.array and user implemented types,
> that is. lxml has buffer support for its serialised XSLT output, for example.
>

You can already easily obtain a pointer from a bytes object, which is
already 1d anyways :) Whether buffers on array.array are useful is
still questionable given their variably-sized nature.

>> I always teach np.ndarray[double...]. I've never told anyone about
>> object[...], I don't think it's in much use. For starters it's going to be
>> horribly inefficient unless you also add "mode='strided'" within the brackets.
>
> Ah, good to know.
>
>
>> My proposal (and Mark's I think) is:
>>
>> Since the memoryviews will neatly cover the general exporter case, and
>> since the [] syntax is much overloaded already (used for C++ templates
>> too), we should deprecate object[...] no matter what else happens.
>

I agree with deprecating the object[] syntax, I think memoryviews
should prove themselves a bit more for e.g. 0.17, before we start
deprecating np.ndarray.

>> Depending on what's decided for np.ndarray[...], we have:
>>
>> Case A): Deprecate both np.ndarray[...] and object[...]
>>
>> Case B): Only deprecate object[...], keep np.ndarray[...] (e.g., through a
>> decorator used in numpy.pxd on the ndarray type). So rather than having a
>> trailing [] mean buffers unless it means something else (like C++
>> templates), we instead make np.ndarray a "template", through special
>> compiler support.
>
> What's the point in technically deprecating them if we can't remove them
> without breaking code? Wouldn't it be better to deprecate them only in the
> docs?
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From markflorisson88 at gmail.com  Thu Jun 21 16:24:22 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 21 Jun 2012 15:24:22 +0100
Subject: [Cython] buffer shape incompatible with memoryview shape
In-Reply-To: <4FE30E0A.8020003@astro.uio.no>
References: <4FE2E27B.8010102@behnel.de> <4FE3010D.4060700@astro.uio.no>
	<4FE30721.2070902@behnel.de> <4FE30E0A.8020003@astro.uio.no>
Message-ID: <CANg26EWJWY4sy7BuctutfcJqBJYMQSNykLiJ2U4x+2egawQwCg@mail.gmail.com>

On 21 June 2012 13:05, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 06/21/2012 01:36 PM, Stefan Behnel wrote:
>>
>> Dag Sverre Seljebotn, 21.06.2012 13:10:
>>>
>>> On 06/21/2012 10:59 AM, Stefan Behnel wrote:
>>>>
>>>> I find this worth fixing for 0.17:
>>>>
>>>> http://trac.cython.org/cython_trac/ticket/780
>>>
>>>
>>> I'm not sure about the timeline here.
>>>
>>> The object<->memoryview semantics haven't even been hammered down yet;
>>> does
>>> "mview.customattr" trigger an AttributeError, SyntaxError or fall back to
>>> some underlying object (constructing it if necesarry).
>>>
>>> Until that happens, memoryviews are an experimental feature and present
>>> for
>>> development purposes mostly, so it's not like this is a big bug that
>>> would
>>> bite end-users. Thinking about those semantics is much more important...
>>
>>
>> Absolutely.
>>
>> I ran into this when I gave a Cython+NumPy course and this was the first
>> thing that the attendants tried when I asked them to validate that two
>> input arrays have the same size before adding them. It's the one obvious
>> way to do it, and it fails miserably. I think it should be fixed, and I
>> think it should be fixed soon because it feels really low-level and
>> complicated, especially to new users.
>
>
> Can you clarify a bit -- did you give this course using np.ndarray[double,
> ndim=2], or double[:, :]? They're really very separate under the hood and
> the fix is different.

I think we should support both, although it seems a bit of a shame to
fix something just a while before deprecating it :) Anyway, both fixes
are really straightforward anyway.

> Or, did you actually use object[double, ndim=2] like in the bug report? (Did
> me and Mark get around to propose deprecating this one on the list?)
>
> Dag
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Fri Jun 22 19:51:42 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 22 Jun 2012 19:51:42 +0200
Subject: [Cython] Jenkins status
In-Reply-To: <4FE2FEBB.8050506@behnel.de>
References: <4FE2E27B.8010102@behnel.de>
	<CANg26EWV-=axmbT+sD1u53mOxzE-cfFGdFBx=Z282o8J-Wz-Rg@mail.gmail.com>
	<4FE2FEBB.8050506@behnel.de>
Message-ID: <4FE4B0AE.4090509@behnel.de>

Stefan Behnel, 21.06.2012 13:00:
> mark florisson, 21.06.2012 12:38:
>> BTW, Stefan, how do we start Jenkins on the sage server? It's been
>> down for weeks now.
> 
> It seems like the sage.math server would be happy about a restart. I'll
> trigger the ML.

Jenkins is back up and building.

Stefan

From stefan_ml at behnel.de  Fri Jun 22 22:30:17 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 22 Jun 2012 22:30:17 +0200
Subject: [Cython] Test failures in Jenkins
Message-ID: <4FE4D5D9.4090507@behnel.de>

Hi,

Jenkins found a couple of test failures. I haven't looked through them yet,
but if anything looks familiar or obvious to someone, please go ahead and
fix it.

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/430/

Stefan

From stefan_ml at behnel.de  Sat Jun 23 10:11:30 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 23 Jun 2012 10:11:30 +0200
Subject: [Cython] new Jenkins setup
Message-ID: <4FE57A32.40001@behnel.de>

Hi,

I moved the Jenkins installation out of the USB disk and into my home
directory. The USB disk has proven very fragile in the past, so this will
make us more independent from reboots and disk failures.

To keep the builds fast, the workspaces have moved into a ramdisk, which is
limited to 20 GB. This is less than the previous directory size, so I
changed the job configs to delete redundant data after the builds, namely
the unpacked CPython directories and the Cython installation. Those are
still available in the job workspaces in form of the archives that the
build jobs copy over at the beginning. So, to reproduce test failures on
the Jenkins server, you can just unpack them manually.

Currently, we are way below the limit (<5GB), but the developer branches
haven't been built yet. It looks like no job takes more than 1GB when it
runs, so the 6 active jobs that we run in parallel will not take more than
6GB in total. That leaves some 14 GB for the resident jobs. Those tend to
stay around 30-100MB each (the Cython matrix jobs are more like 30-50MB per
configuration), so we can keep quite a lot of jobs in the ramdisk. I'll
keep an eye on them from time to time, but I think we'll be fine with that
for a while.

Still, please take a bit of care when you make changes to build jobs that
you do not leave unnecessarily large sets of redundant data lying around at
the end. Anything that we clearly no longer need after the build and that
can be deleted will keep space free for running jobs and things like ccache.

Stefan

From stefan_ml at behnel.de  Mon Jun 25 12:31:55 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 25 Jun 2012 12:31:55 +0200
Subject: [Cython] static type checking in Python
Message-ID: <4FE83E1B.4050409@behnel.de>

Hi,

there's some work going on regarding static type analysis and checking of
Python programs, here's the mailing list for it:

https://groups.google.com/group/python-static-type-checking

I think this is somewhat related to Cython. After all, they are trying to
figure out static type information from source code - although apparently
rather in order to find bugs than to speed things up. But the one doesn't
necessarily exclude the other.

Stefan

From drsalists at gmail.com  Mon Jun 25 20:58:50 2012
From: drsalists at gmail.com (Dan Stromberg)
Date: Mon, 25 Jun 2012 18:58:50 +0000
Subject: [Cython] new FFI library for Python
In-Reply-To: <4FDF3738.9040006@behnel.de>
References: <4FDF3738.9040006@behnel.de>
Message-ID: <CAGGBd_rtZcqDfWUc_Bw+bMfdZgVB+5AoEq2ebgso1BAO21Cbyw@mail.gmail.com>

Is it related to Common Lisp's CFFI?  If not, it might be confusing to
have two things with the same name, similar purposes, but not really
the same thing.
http://common-lisp.net/project/cffi/

On Mon, Jun 18, 2012 at 2:12 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> the PyPy folks have come up with a new FFI library (called cffi) for
> CPython (and eventually PyPy, obviously).
>
> http://cffi.readthedocs.org/
>
> It borrows from LuaJIT's FFI in that it parses C declarations at runtime.
> It then builds a C extension to access the external code, i.e. it requires
> a C compiler at runtime (when running in CPython).
>
> Just thought this might be interesting.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Mon Jun 25 21:33:02 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 25 Jun 2012 21:33:02 +0200
Subject: [Cython] new FFI library for Python
In-Reply-To: <CAGGBd_rtZcqDfWUc_Bw+bMfdZgVB+5AoEq2ebgso1BAO21Cbyw@mail.gmail.com>
References: <4FDF3738.9040006@behnel.de>
	<CAGGBd_rtZcqDfWUc_Bw+bMfdZgVB+5AoEq2ebgso1BAO21Cbyw@mail.gmail.com>
Message-ID: <4FE8BCEE.1040902@behnel.de>

Dan Stromberg, 25.06.2012 20:58:
> Is it related to Common Lisp's CFFI?  If not, it might be confusing to
> have two things with the same name, similar purposes, but not really
> the same thing.
> http://common-lisp.net/project/cffi/

I think "cffi" for "C foreign function interface" is just the one obvious
name for such a thing.

Stefan

From stefan_ml at behnel.de  Tue Jun 26 22:36:51 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 26 Jun 2012 22:36:51 +0200
Subject: [Cython] planning for 0.17
Message-ID: <4FEA1D63.5090902@behnel.de>

Hi,

I'd like to get an idea of what's still open for 0.17.

Mark mentioned some open memoryview issues on his list and I know that
there are still issues with PyPy, some of which could get fixed in a
reasonable time frame. Also, Jenkins isn't all that happy yet.

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/

What's the current state of the master branch for everyone? Anything that
you're working on and/or that you think should go in but isn't yet?

I would like to see 0.17 released some time next month, if possible. I
don't currently see any real blockers, so that might be doable.

The release notes look ok so far, but the bug tracker list is really short
in comparison. Please add to both as you see fit.

http://wiki.cython.org/ReleaseNotes-0.17

http://trac.cython.org/cython_trac/query?status=closed&group=component&order=id&col=id&col=summary&col=milestone&col=status&col=type&col=priority&col=component&milestone=0.17&desc=1

Stefan

From vitja.makarov at gmail.com  Wed Jun 27 06:29:45 2012
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Wed, 27 Jun 2012 08:29:45 +0400
Subject: [Cython] planning for 0.17
In-Reply-To: <4FEA1D63.5090902@behnel.de>
References: <4FEA1D63.5090902@behnel.de>
Message-ID: <CAKGHGPQTdKhTrR_NAAie_Zy+s1ZGWMpijk5E6ai5Myw3K3hUxA@mail.gmail.com>

2012/6/27 Stefan Behnel <stefan_ml at behnel.de>:
> Hi,
>
> I'd like to get an idea of what's still open for 0.17.
>
> Mark mentioned some open memoryview issues on his list and I know that
> there are still issues with PyPy, some of which could get fixed in a
> reasonable time frame. Also, Jenkins isn't all that happy yet.
>
> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/
>
> What's the current state of the master branch for everyone? Anything that
> you're working on and/or that you think should go in but isn't yet?
>

I'm ok with it.

> I would like to see 0.17 released some time next month, if possible. I
> don't currently see any real blockers, so that might be doable.
>
> The release notes look ok so far, but the bug tracker list is really short
> in comparison. Please add to both as you see fit.
>
> http://wiki.cython.org/ReleaseNotes-0.17
>
> http://trac.cython.org/cython_trac/query?status=closed&group=component&order=id&col=id&col=summary&col=milestone&col=status&col=type&col=priority&col=component&milestone=0.17&desc=1
>

I've updated T766's milstone from 0.16 to 0.17 as it didn't get into
0.16 release.


-- 
vitja.

From stefan_ml at behnel.de  Wed Jun 27 09:40:12 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 27 Jun 2012 09:40:12 +0200
Subject: [Cython] planning for 0.17
In-Reply-To: <CAKGHGPQTdKhTrR_NAAie_Zy+s1ZGWMpijk5E6ai5Myw3K3hUxA@mail.gmail.com>
References: <4FEA1D63.5090902@behnel.de>
	<CAKGHGPQTdKhTrR_NAAie_Zy+s1ZGWMpijk5E6ai5Myw3K3hUxA@mail.gmail.com>
Message-ID: <4FEAB8DC.4020200@behnel.de>

Vitja Makarov, 27.06.2012 06:29:
> I've updated T766's milstone from 0.16 to 0.17 as it didn't get into
> 0.16 release.

Could you add it to the release notes then?

Stefan

From vitja.makarov at gmail.com  Wed Jun 27 11:17:49 2012
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Wed, 27 Jun 2012 13:17:49 +0400
Subject: [Cython] planning for 0.17
In-Reply-To: <4FEAB8DC.4020200@behnel.de>
References: <4FEA1D63.5090902@behnel.de>
	<CAKGHGPQTdKhTrR_NAAie_Zy+s1ZGWMpijk5E6ai5Myw3K3hUxA@mail.gmail.com>
	<4FEAB8DC.4020200@behnel.de>
Message-ID: <CAKGHGPS3WZeyd_ao5wy4cVrsYj_eVxCZWgvVWh=sOsFmfnMqQA@mail.gmail.com>

2012/6/27 Stefan Behnel <stefan_ml at behnel.de>:
> Vitja Makarov, 27.06.2012 06:29:
>> I've updated T766's milstone from 0.16 to 0.17 as it didn't get into
>> 0.16 release.
>
> Could you add it to the release notes then?
>

I think it's too minor change to be listed in release notes.

-- 
vitja.

From markflorisson88 at gmail.com  Wed Jun 27 11:54:00 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 27 Jun 2012 10:54:00 +0100
Subject: [Cython] planning for 0.17
In-Reply-To: <4FEA1D63.5090902@behnel.de>
References: <4FEA1D63.5090902@behnel.de>
Message-ID: <CANg26EU6pwGj=67TKWD-GAncGFQU6yW0DwjgCB_ZaBAcsnvBVA@mail.gmail.com>

On 26 June 2012 21:36, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> I'd like to get an idea of what's still open for 0.17.
>
> Mark mentioned some open memoryview issues on his list and I know that
> there are still issues with PyPy, some of which could get fixed in a
> reasonable time frame. Also, Jenkins isn't all that happy yet.
>
> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/
>
> What's the current state of the master branch for everyone? Anything that
> you're working on and/or that you think should go in but isn't yet?
>
> I would like to see 0.17 released some time next month, if possible. I
> don't currently see any real blockers, so that might be doable.
>
> The release notes look ok so far, but the bug tracker list is really short
> in comparison. Please add to both as you see fit.
>
> http://wiki.cython.org/ReleaseNotes-0.17
>
> http://trac.cython.org/cython_trac/query?status=closed&group=component&order=id&col=id&col=summary&col=milestone&col=status&col=type&col=priority&col=component&milestone=0.17&desc=1
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Hey,

Sounds good, I'll have a look at the memoryview tests. One is due to
numpy headers redefining PyIndex_Check (though I thought I fixed that
previously). Defaults for fused def functions may also fail in some
cases, I'll try to fix that as well, or issue an error otherwise for
now.

That said, I'm busy with a dissertation and some other stuff, so if
anyone would like to pick up the  release for 0.17, I'd be much
obliged.

I can't test it right now, but I don't understand the following in the
release notes (regarding array.array): "Note that only the buffer
syntax is supported for these arrays. To use memoryviews with them,
use the buffer syntax to unpack the buffer first.". Why is that, it
implements __getbuffer__ right? So it shouldn't matter whether you use
memoryviews or buffer syntax, both use __Pyx_GetBuffer().

Mark

From stefan_ml at behnel.de  Wed Jun 27 13:59:39 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 27 Jun 2012 13:59:39 +0200
Subject: [Cython] planning for 0.17
In-Reply-To: <CANg26EU6pwGj=67TKWD-GAncGFQU6yW0DwjgCB_ZaBAcsnvBVA@mail.gmail.com>
References: <4FEA1D63.5090902@behnel.de>
	<CANg26EU6pwGj=67TKWD-GAncGFQU6yW0DwjgCB_ZaBAcsnvBVA@mail.gmail.com>
Message-ID: <4FEAF5AB.8040900@behnel.de>

mark florisson, 27.06.2012 11:54:
> if anyone would like to pick up the  release for 0.17, I'd be much
> obliged.

I think I can handle it. :)


> I can't test it right now, but I don't understand the following in the
> release notes (regarding array.array): "Note that only the buffer
> syntax is supported for these arrays. To use memoryviews with them,
> use the buffer syntax to unpack the buffer first.". Why is that, it
> implements __getbuffer__ right? So it shouldn't matter whether you use
> memoryviews or buffer syntax, both use __Pyx_GetBuffer().

The problem is that arrayarray.pxd is only used when the exporter is typed.
This means that you can't do this:

    def func(int[:] arr): pass

    func(array.array('i', [1,2,3]))

but it will work if func() is defined like this:

    def func(array.array arr):
        cdef int[:] view = arr

I admit that the wording in the release notes is wrong, I wrote it because
I initially thought that you had to do this:

    def func(array.array[int] arr):
        cdef int[:] view = arr

But no, you don't have to use the buffer interface, you just have to type
the variable. I'll update the release notes.

Works better in Py3, obviously.

Stefan

From markflorisson88 at gmail.com  Wed Jun 27 14:17:47 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 27 Jun 2012 13:17:47 +0100
Subject: [Cython] planning for 0.17
In-Reply-To: <4FEAF5AB.8040900@behnel.de>
References: <4FEA1D63.5090902@behnel.de>
	<CANg26EU6pwGj=67TKWD-GAncGFQU6yW0DwjgCB_ZaBAcsnvBVA@mail.gmail.com>
	<4FEAF5AB.8040900@behnel.de>
Message-ID: <CANg26EWmiYeFc1vjtUQ0+Sfyr86uvcDKRXFJh78t7AbrK2be=g@mail.gmail.com>

On 27 June 2012 12:59, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 27.06.2012 11:54:
>> if anyone would like to pick up the ?release for 0.17, I'd be much
>> obliged.
>
> I think I can handle it. :)
>

Great, thanks!

>> I can't test it right now, but I don't understand the following in the
>> release notes (regarding array.array): "Note that only the buffer
>> syntax is supported for these arrays. To use memoryviews with them,
>> use the buffer syntax to unpack the buffer first.". Why is that, it
>> implements __getbuffer__ right? So it shouldn't matter whether you use
>> memoryviews or buffer syntax, both use __Pyx_GetBuffer().
>
> The problem is that arrayarray.pxd is only used when the exporter is typed.
> This means that you can't do this:
>
> ? ?def func(int[:] arr): pass
>
> ? ?func(array.array('i', [1,2,3]))

That works for me, as long and array is cimported from cpython (as
'array' or some other name). It will patch __Pyx_GetBuffer with a
typecheck and a call to its __getbuffer__ method.

> but it will work if func() is defined like this:
>
> ? ?def func(array.array arr):
> ? ? ? ?cdef int[:] view = arr
>
> I admit that the wording in the release notes is wrong, I wrote it because
> I initially thought that you had to do this:
>
> ? ?def func(array.array[int] arr):
> ? ? ? ?cdef int[:] view = arr
>
> But no, you don't have to use the buffer interface, you just have to type
> the variable. I'll update the release notes.
>
> Works better in Py3, obviously.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Wed Jun 27 14:48:04 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 27 Jun 2012 14:48:04 +0200
Subject: [Cython] planning for 0.17
In-Reply-To: <CANg26EWmiYeFc1vjtUQ0+Sfyr86uvcDKRXFJh78t7AbrK2be=g@mail.gmail.com>
References: <4FEA1D63.5090902@behnel.de>
	<CANg26EU6pwGj=67TKWD-GAncGFQU6yW0DwjgCB_ZaBAcsnvBVA@mail.gmail.com>
	<4FEAF5AB.8040900@behnel.de>
	<CANg26EWmiYeFc1vjtUQ0+Sfyr86uvcDKRXFJh78t7AbrK2be=g@mail.gmail.com>
Message-ID: <4FEB0104.6050106@behnel.de>

mark florisson, 27.06.2012 14:17:
> On 27 June 2012 12:59, Stefan Behnel wrote:
>> mark florisson, 27.06.2012 11:54:
>>> I can't test it right now, but I don't understand the following in the
>>> release notes (regarding array.array): "Note that only the buffer
>>> syntax is supported for these arrays. To use memoryviews with them,
>>> use the buffer syntax to unpack the buffer first.". Why is that, it
>>> implements __getbuffer__ right? So it shouldn't matter whether you use
>>> memoryviews or buffer syntax, both use __Pyx_GetBuffer().
>>
>> The problem is that arrayarray.pxd is only used when the exporter is typed.
>> This means that you can't do this:
>>
>>    def func(int[:] arr): pass
>>
>>    func(array.array('i', [1,2,3]))
> 
> That works for me, as long and array is cimported from cpython (as
> 'array' or some other name). It will patch __Pyx_GetBuffer with a
> typecheck and a call to its __getbuffer__ method.

Hmm, interesting. I keep learning. I'll add tests for that.

For the memoryview_type and array_type checks, wouldn't a type identity
test be enough instead of a PyObject_TypeCheck() ?

Stefan

From markflorisson88 at gmail.com  Wed Jun 27 15:03:56 2012
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 27 Jun 2012 14:03:56 +0100
Subject: [Cython] planning for 0.17
In-Reply-To: <4FEB0104.6050106@behnel.de>
References: <4FEA1D63.5090902@behnel.de>
	<CANg26EU6pwGj=67TKWD-GAncGFQU6yW0DwjgCB_ZaBAcsnvBVA@mail.gmail.com>
	<4FEAF5AB.8040900@behnel.de>
	<CANg26EWmiYeFc1vjtUQ0+Sfyr86uvcDKRXFJh78t7AbrK2be=g@mail.gmail.com>
	<4FEB0104.6050106@behnel.de>
Message-ID: <CANg26EUzSJiU72us_xPOaiArCic5S=kUKO_k142U0Unj2=BENA@mail.gmail.com>

On 27 June 2012 13:48, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 27.06.2012 14:17:
>> On 27 June 2012 12:59, Stefan Behnel wrote:
>>> mark florisson, 27.06.2012 11:54:
>>>> I can't test it right now, but I don't understand the following in the
>>>> release notes (regarding array.array): "Note that only the buffer
>>>> syntax is supported for these arrays. To use memoryviews with them,
>>>> use the buffer syntax to unpack the buffer first.". Why is that, it
>>>> implements __getbuffer__ right? So it shouldn't matter whether you use
>>>> memoryviews or buffer syntax, both use __Pyx_GetBuffer().
>>>
>>> The problem is that arrayarray.pxd is only used when the exporter is typed.
>>> This means that you can't do this:
>>>
>>> ? ?def func(int[:] arr): pass
>>>
>>> ? ?func(array.array('i', [1,2,3]))
>>
>> That works for me, as long and array is cimported from cpython (as
>> 'array' or some other name). It will patch __Pyx_GetBuffer with a
>> typecheck and a call to its __getbuffer__ method.
>
> Hmm, interesting. I keep learning. I'll add tests for that.
>
> For the memoryview_type and array_type checks, wouldn't a type identity
> test be enough instead of a PyObject_TypeCheck() ?
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Well, you want it to work for subclasses as well. I think the only
thing that doesn't work (pre-2.6), is overriding __getbuffer__ in a
subclass outside of the module or pxd. For memoryviews, since each
module has a different memoryview type, I inject a capsule in tp_dict,
which __Pyx_GetBuffer checks for (it's called __pyx_getbuffer and
__pyx_releasebuffer).

From stefan_ml at behnel.de  Wed Jun 27 15:09:32 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 27 Jun 2012 15:09:32 +0200
Subject: [Cython] planning for 0.17
In-Reply-To: <CANg26EUzSJiU72us_xPOaiArCic5S=kUKO_k142U0Unj2=BENA@mail.gmail.com>
References: <4FEA1D63.5090902@behnel.de>
	<CANg26EU6pwGj=67TKWD-GAncGFQU6yW0DwjgCB_ZaBAcsnvBVA@mail.gmail.com>
	<4FEAF5AB.8040900@behnel.de>
	<CANg26EWmiYeFc1vjtUQ0+Sfyr86uvcDKRXFJh78t7AbrK2be=g@mail.gmail.com>
	<4FEB0104.6050106@behnel.de>
	<CANg26EUzSJiU72us_xPOaiArCic5S=kUKO_k142U0Unj2=BENA@mail.gmail.com>
Message-ID: <4FEB060C.5060002@behnel.de>

mark florisson, 27.06.2012 15:03:
> On 27 June 2012 13:48, Stefan Behnel wrote:
>> mark florisson, 27.06.2012 14:17:
>>> That works for me, as long and array is cimported from cpython (as
>>> 'array' or some other name). It will patch __Pyx_GetBuffer with a
>>> typecheck and a call to its __getbuffer__ method.
>>
>> For the memoryview_type and array_type checks, wouldn't a type identity
>> test be enough instead of a PyObject_TypeCheck() ?
> 
> Well, you want it to work for subclasses as well.

Fine with me.


> I think the only
> thing that doesn't work (pre-2.6), is overriding __getbuffer__ in a
> subclass outside of the module or pxd. For memoryviews, since each
> module has a different memoryview type, I inject a capsule in tp_dict,
> which __Pyx_GetBuffer checks for (it's called __pyx_getbuffer and
> __pyx_releasebuffer).

I'm sure it'll be a lot of fun to rip that out when we finally drop support
for Python 2.5 ...

Stefan

From dieter at handshake.de  Thu Jun 28 09:04:14 2012
From: dieter at handshake.de (Dieter Maurer)
Date: Thu, 28 Jun 2012 09:04:14 +0200
Subject: [Cython] Feature request: generate signature information for use by
	"inspect"
Message-ID: <20460.494.30787.680552@localhost.localdomain>

Python's "inspect" module is a great help to get valuable information
about a package. Many higher level tools (e.g. the "help" builtin
and "pydoc") are based on it.

I have just recognized a deficiency of "cython" generated
modules with respect to "inspect" support:

     "inspect" cannot determine the signatures for Python functions
     defined in "Cython" source.

     I understand that this might be a limitation of Python's "C"
     interface.
     In this case, I suggest to enhance the
     function's docstring by signature information.

     I now transform manually my docstrings

        def <rv> <f>(<signature>):
	  """<header>

	  <detail>
	  """

     into:

        def <rv> <f>(<signature>):
	  """<f>(<signature>) -> <rv>: <header>

	  <detail>
	  """

     and would be happy to get something similar automatically.


--
Dieter

From stefan_ml at behnel.de  Thu Jun 28 09:25:32 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 28 Jun 2012 09:25:32 +0200
Subject: [Cython] Feature request: generate signature information for
 use by "inspect"
In-Reply-To: <20460.494.30787.680552@localhost.localdomain>
References: <20460.494.30787.680552@localhost.localdomain>
Message-ID: <4FEC06EC.2000107@behnel.de>

Dieter Maurer, 28.06.2012 09:04:
> Python's "inspect" module is a great help to get valuable information
> about a package. Many higher level tools (e.g. the "help" builtin
> and "pydoc") are based on it.
> 
> I have just recognized a deficiency of "cython" generated
> modules with respect to "inspect" support:
> 
>      "inspect" cannot determine the signatures for Python functions
>      defined in "Cython" source.
> 
>      I understand that this might be a limitation of Python's "C"
>      interface.

Correct, although Cython goes to great length to enable introspection of
Cython implemented functions and classes (admittedly, we could still do
more...)


>      In this case, I suggest to enhance the
>      function's docstring by signature information.
> 
>      I now transform manually my docstrings
> 
>         def <rv> <f>(<signature>):
> 	  """<header>
> 
> 	  <detail>
> 	  """
> 
>      into:
> 
>         def <rv> <f>(<signature>):
> 	  """<f>(<signature>) -> <rv>: <header>
> 
> 	  <detail>
> 	  """
> 
>      and would be happy to get something similar automatically.

And the time machine strikes again. You can use the "embedsignature"
compiler option for that.

http://docs.cython.org/src/reference/compilation.html?highlight=embedsignature#compiler-directives

Stefan

From dieter at handshake.de  Thu Jun 28 11:12:03 2012
From: dieter at handshake.de (Dieter Maurer)
Date: Thu, 28 Jun 2012 11:12:03 +0200
Subject: [Cython] Feature request: generate signature information
	for	use by "inspect"
In-Reply-To: <4FEC06EC.2000107@behnel.de>
References: <20460.494.30787.680552@localhost.localdomain>
	<4FEC06EC.2000107@behnel.de>
Message-ID: <20460.8163.513094.324883@localhost.localdomain>

Stefan Behnel wrote at 2012-6-28 09:25 +0200:
>Dieter Maurer, 28.06.2012 09:04:
>> ...
>>      In this case, I suggest to enhance the
>>      function's docstring by signature information.
>>
>>      I now transform manually my docstrings
>>
>>         def <rv> <f>(<signature>):
>> 	  """<header>
>>
>> 	  <detail>
>> 	  """
>>
>>      into:
>>
>>         def <rv> <f>(<signature>):
>> 	  """<f>(<signature>) -> <rv>: <header>
>>
>> 	  <detail>
>> 	  """
>>
>>      and would be happy to get something similar automatically.
>
>And the time machine strikes again. You can use the "embedsignature"
>compiler option for that.
>
>http://docs.cython.org/src/reference/compilation.html?highlight=embedsignature#compiler-directives


Thank you! I missed this part of the documentation.


--
Dieter

From robertwb at gmail.com  Thu Jun 28 10:59:59 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Thu, 28 Jun 2012 01:59:59 -0700
Subject: [Cython] Automatic C++ conversions
Message-ID: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>

I've been looking how painful it is to constantly convert between
Python objects and string in C++. Yes, it's easy to write a utility,
but this should be as natural (if not more so, as the length is
explicit) than bytes <-> char*. Several other of the libcpp classes
(vector, map) have natural Python analogues too.

What would people think about making it possible to declare these in a
C++ file? Being able to make arbitrary mappings anywhere between types
is contextless global state that I'd rather avoid, but perhaps special
methods defined on the class such as

cdef extern from "<string>" namespace "std":
    cdef cppclass string:
        def __object__(sting s):
            return s.c_str()[s.size()]
        def __create__(object o):
            return string(<char*>o, len(o))
        ...

(names open to suggestions) Then one could write

cdef extern from *:
    string c_func(string)

def f(x):
    return c_func(x)

- Robert

From stefan_ml at behnel.de  Thu Jun 28 11:54:49 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 28 Jun 2012 11:54:49 +0200
Subject: [Cython] Automatic C++ conversions
In-Reply-To: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>
References: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>
Message-ID: <4FEC29E9.9030609@behnel.de>

Robert Bradshaw, 28.06.2012 10:59:
> I've been looking how painful it is to constantly convert between
> Python objects and string in C++.

You mean std::string (as I think it's called)? Can't we just special case
that in the same way that we special case char* and friends? Basically just
one type more in that list. And it would give you efficient
encoding/decoding more or less for free.

I mean, well, it would likely break existing code to start doing that (in
the same way that we broke code by enabling type inference for convertible
pointers), but as long as it helps more than it breaks ...


> Yes, it's easy to write a utility,
> but this should be as natural (if not more so, as the length is
> explicit) than bytes <-> char*. Several other of the libcpp classes
> (vector, map) have natural Python analogues too.

And you would want to enable coercion to those, too? Have a vector copy
into a Python list automatically? (Although that's trivially done with a
list comprehension, maybe the other way is more interesting...)

I think, as long as there is one obvious mapping for a given type, I
wouldn't mind letting Cython apply it automatically.


> What would people think about making it possible to declare these in a
> C++ file? Being able to make arbitrary mappings anywhere between types
> is contextless global state that I'd rather avoid, but perhaps special
> methods defined on the class such as
> 
> cdef extern from "<string>" namespace "std":
>     cdef cppclass string:
>         def __object__(sting s):
>             return s.c_str()[s.size()]
>         def __create__(object o):
>             return string(<char*>o, len(o))
>         ...
> 
> (names open to suggestions) Then one could write
> 
> cdef extern from *:
>     string c_func(string)
> 
> def f(x):
>     return c_func(x)

Admittedly, it fits somewhat more naturally into C++ classes than generally
into C, although we could allow the same thing in ctypedefs.

However, I'm reluctant to introduce something like this as long as we can
get away with built-in auto-coercion.

Stefan

From robertwb at gmail.com  Thu Jun 28 12:07:13 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Thu, 28 Jun 2012 03:07:13 -0700
Subject: [Cython] Automatic C++ conversions
In-Reply-To: <4FEC29E9.9030609@behnel.de>
References: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>
	<4FEC29E9.9030609@behnel.de>
Message-ID: <CADiQ+QCdq52RHSmoK-mOkEF96P7yGj25zTB+wfd+LmgzCNcNtA@mail.gmail.com>

On Thu, Jun 28, 2012 at 2:54 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 28.06.2012 10:59:
>> I've been looking how painful it is to constantly convert between
>> Python objects and string in C++.
>
> You mean std::string (as I think it's called)? Can't we just special case
> that in the same way that we special case char* and friends?

Yes, we could. If we do that it'd make sense to special case list and
vector and pair and and map and set as well, though perhaps those are
special enough to hard code them, and it makes the language simpler to
not have more special methods.

> Basically just
> one type more in that list. And it would give you efficient
> encoding/decoding more or less for free.
>
> I mean, well, it would likely break existing code to start doing that (in
> the same way that we broke code by enabling type inference for convertible
> pointers), but as long as it helps more than it breaks ...

I don't think it'd be backwards compatible, currently it's just an error.

>> Yes, it's easy to write a utility,
>> but this should be as natural (if not more so, as the length is
>> explicit) than bytes <-> char*. Several other of the libcpp classes
>> (vector, map) have natural Python analogues too.
>
> And you would want to enable coercion to those, too? Have a vector copy
> into a Python list automatically? (Although that's trivially done with a
> list comprehension, maybe the other way is more interesting...)
>
> I think, as long as there is one obvious mapping for a given type, I
> wouldn't mind letting Cython apply it automatically.
>
>
>> What would people think about making it possible to declare these in a
>> C++ file? Being able to make arbitrary mappings anywhere between types
>> is contextless global state that I'd rather avoid, but perhaps special
>> methods defined on the class such as
>>
>> cdef extern from "<string>" namespace "std":
>> ? ? cdef cppclass string:
>> ? ? ? ? def __object__(sting s):
>> ? ? ? ? ? ? return s.c_str()[s.size()]
>> ? ? ? ? def __create__(object o):
>> ? ? ? ? ? ? return string(<char*>o, len(o))
>> ? ? ? ? ...
>>
>> (names open to suggestions) Then one could write
>>
>> cdef extern from *:
>> ? ? string c_func(string)
>>
>> def f(x):
>> ? ? return c_func(x)
>
> Admittedly, it fits somewhat more naturally into C++ classes than generally
> into C, although we could allow the same thing in ctypedefs.
>
> However, I'm reluctant to introduce something like this as long as we can
> get away with built-in auto-coercion.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Thu Jun 28 14:10:07 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 28 Jun 2012 14:10:07 +0200
Subject: [Cython] Automatic C++ conversions
In-Reply-To: <CADiQ+QCdq52RHSmoK-mOkEF96P7yGj25zTB+wfd+LmgzCNcNtA@mail.gmail.com>
References: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>
	<4FEC29E9.9030609@behnel.de>
	<CADiQ+QCdq52RHSmoK-mOkEF96P7yGj25zTB+wfd+LmgzCNcNtA@mail.gmail.com>
Message-ID: <4FEC499F.5050505@behnel.de>

Robert Bradshaw, 28.06.2012 12:07:
> On Thu, Jun 28, 2012 at 2:54 AM, Stefan Behnel wrote:
>> Robert Bradshaw, 28.06.2012 10:59:
>>> I've been looking how painful it is to constantly convert between
>>> Python objects and string in C++.
>>
>> You mean std::string (as I think it's called)? Can't we just special case
>> that in the same way that we special case char* and friends?
> 
> Yes, we could.

Then I think it makes sense to do that. Basically, the std::string type
would set its is_string flag and then we need the actual coercion code for it.


> If we do that it'd make sense to special case list and
> vector and pair and and map and set as well, though perhaps those are
> special enough to hard code them, and it makes the language simpler to
> not have more special methods.

Ok, then it's

std::string <=> bytes
std::vector <=> list
std::map <=> dict
std::set <=> set

Potentially also:

std::pair => tuple (maybe 2-tuple => std::pair with a runtime length test?)

What about allowing list(<C++ iterator>) etc.? As long as the item type can
be coerced at compile time, this should be doable:

<C++ iterator> => Python iterator

and it would even be easy to implement in Cython code using a generator
function. The other direction (Python iterator => <C++ iterator>) would be
trickier but could also be made to work when the C++ item type on the LHS
of the assignment that triggers the coercion is known at compile time.

We might want to look for a way to make these coercions a "thing" in the
code (maybe through a registry or dedicated class) rather than adding
special casing code everywhere.

I think a CEP would be a good way to specify the above coercions.

I also think that this is large enough a feature to openly ask for sponsorship.


>> Basically just
>> one type more in that list. And it would give you efficient
>> encoding/decoding more or less for free.
>>
>> I mean, well, it would likely break existing code to start doing that (in
>> the same way that we broke code by enabling type inference for convertible
>> pointers), but as long as it helps more than it breaks ...
> 
> I don't think it'd be backwards compatible, currently it's just an error.

Ah, right, sorry. I got confused. Assignments to an untyped variable
inherit the type of the RHS, so only typed assignments would be impacted,
and those are currently errors, sure. Nothing in the way then.

Stefan

From stefan_ml at behnel.de  Fri Jun 29 07:45:05 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 29 Jun 2012 07:45:05 +0200
Subject: [Cython] [cython-users] C++: how to handle failures of 'new'?
In-Reply-To: <CADiQ+QBeBw=FYsNPpSU_OJOU48m7E1=csxMxfenMQLs2jhJgPw@mail.gmail.com>
References: <4FECA49B.8090404@behnel.de>
	<CADiQ+QBeBw=FYsNPpSU_OJOU48m7E1=csxMxfenMQLs2jhJgPw@mail.gmail.com>
Message-ID: <4FED40E1.1040301@behnel.de>

[moving this to cython-devel as it's getting technical]

Robert Bradshaw, 28.06.2012 21:46:
> On Thu, Jun 28, 2012 at 11:38 AM, Stefan Behnel wrote:
>> currently, when I write "new CppClass()" in Cython, it generates a straight
>> call to the "new" operator. It doesn't do any error handling. And the
>> current documentation doesn't even mention this case.
>>
>> Is there a "standard" way to handle this? It seems that C++ has different
>> ways to deal with failures here but raises an exception by default. Would
>> you declare the constructor(s) with an "except +MemoryError"? Is there a
>> reason Cython shouldn't be doing this automatically (if nothing else was
>> declared) ?
> 
> I think it certainly makes sense to declare the default constructor as
> "except +" (and std::bad_alloc should become MemoryError),

Right. The code in the constructor can raise other exceptions that must
also be handled properly. An explicit "except +" will handle that.


> but whether
> to implicitly annotate declared constructors is less clear, especially
> as there's no way to un-annotate them.

I agree, but sadly, it's the default behaviour that is wrong. I'm sure we
made lots of users run into this trap already. I fixed the documentation
for now, but the bottom line is that we require users to take care of
proper declarations themselves. Otherwise, the code that we generate is
incorrect, although it's 100% certain that an allocation error can occur,
even if the constructor code doesn't raise any exceptions itself.

Apparently, changing the behaviour of the "new" operator requires a special
annotation "std::nothrow", which then returns NULL on allocation failures.
You can pass that from Cython by hacking up a cname, e.g.

    Rectangle "(std::nothrow) Rectangle" (int w, int h)

I'm sure there are users out there who figured this out (I mean, I did...)
and use it in their code, so I agree that this isn't easy to handle because
Cython simply wouldn't know what the actual error behaviour is for a given
constructor and how to correctly detect an error.

This problem applies only to heap allocation in that form. However, stack
allocation and the new exttype field allocation suffer from similar
problems when the default constructor raises an exception. Exttype fields
are a particularly nasty case because the user has no control over the
allocation. A C++ exception in the C++ class constructor would terminate
the exttype constructor unexpectedly and thus leak resources (in the best
case - no idea how CPython reacts if you throw a C++ exception through its
type instantiation code).

Similarly, a C++ exception in the constructor of a stack allocated object
would then originate from the function entry code and potentially hit the
Python function wrapper etc. Again, potentially leaking resources or worse.

To me, this sounds like we should do something about it. At least for the
implicit calls to the default constructor, we should generate "except +"
code automatically because there is no other way to handle them safely.

Stefan

From robertwb at gmail.com  Fri Jun 29 11:08:21 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Fri, 29 Jun 2012 02:08:21 -0700
Subject: [Cython] [cython-users] C++: how to handle failures of 'new'?
In-Reply-To: <4FED40E1.1040301@behnel.de>
References: <4FECA49B.8090404@behnel.de>
	<CADiQ+QBeBw=FYsNPpSU_OJOU48m7E1=csxMxfenMQLs2jhJgPw@mail.gmail.com>
	<4FED40E1.1040301@behnel.de>
Message-ID: <CADiQ+QAFYMgycY5-RkTf7SOwDb0-qM93Y8+sEzA=+1P9DSyOyA@mail.gmail.com>

On Thu, Jun 28, 2012 at 10:45 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> [moving this to cython-devel as it's getting technical]
>
> Robert Bradshaw, 28.06.2012 21:46:
>> On Thu, Jun 28, 2012 at 11:38 AM, Stefan Behnel wrote:
>>> currently, when I write "new CppClass()" in Cython, it generates a straight
>>> call to the "new" operator. It doesn't do any error handling. And the
>>> current documentation doesn't even mention this case.
>>>
>>> Is there a "standard" way to handle this? It seems that C++ has different
>>> ways to deal with failures here but raises an exception by default. Would
>>> you declare the constructor(s) with an "except +MemoryError"? Is there a
>>> reason Cython shouldn't be doing this automatically (if nothing else was
>>> declared) ?
>>
>> I think it certainly makes sense to declare the default constructor as
>> "except +" (and std::bad_alloc should become MemoryError),
>
> Right. The code in the constructor can raise other exceptions that must
> also be handled properly. An explicit "except +" will handle that.
>
>
>> but whether
>> to implicitly annotate declared constructors is less clear, especially
>> as there's no way to un-annotate them.
>
> I agree, but sadly, it's the default behaviour that is wrong. I'm sure we
> made lots of users run into this trap already. I fixed the documentation
> for now, but the bottom line is that we require users to take care of
> proper declarations themselves. Otherwise, the code that we generate is
> incorrect, although it's 100% certain that an allocation error can occur,
> even if the constructor code doesn't raise any exceptions itself.

This is always the case.

> Apparently, changing the behaviour of the "new" operator requires a special
> annotation "std::nothrow", which then returns NULL on allocation failures.
> You can pass that from Cython by hacking up a cname, e.g.
>
> ? ?Rectangle "(std::nothrow) Rectangle" (int w, int h)
>
> I'm sure there are users out there who figured this out (I mean, I did...)
> and use it in their code, so I agree that this isn't easy to handle because
> Cython simply wouldn't know what the actual error behaviour is for a given
> constructor and how to correctly detect an error.
>
> This problem applies only to heap allocation in that form. However, stack
> allocation and the new exttype field allocation suffer from similar
> problems when the default constructor raises an exception. Exttype fields
> are a particularly nasty case because the user has no control over the
> allocation. A C++ exception in the C++ class constructor would terminate
> the exttype constructor unexpectedly and thus leak resources (in the best
> case - no idea how CPython reacts if you throw a C++ exception through its
> type instantiation code).

If the default constructor raises an exception then it should be
declared (to not do so is an error on the users part). New raising
bad_alloc is a bit of a special case, but doesn't appl to the stack or
exttype allocations.

> Similarly, a C++ exception in the constructor of a stack allocated object
> would then originate from the function entry code and potentially hit the
> Python function wrapper etc. Again, potentially leaking resources or worse.
>
> To me, this sounds like we should do something about it. At least for the
> implicit calls to the default constructor, we should generate "except +"
> code automatically because there is no other way to handle them safely.

If no constructor is declared, it should be "except +" just to be
safe, but otherwise I don't see how this is any different than
forgetting to declare exceptions on any other function. Unfortunately
catching exceptions  (with custom per-object handling) on a set of
stack allocated objects seems difficult if not impossible (without
resorting to ugly hacks like using placement new everywhere).

- Robert

From dieter at handshake.de  Fri Jun 29 11:25:53 2012
From: dieter at handshake.de (Dieter Maurer)
Date: Fri, 29 Jun 2012 11:25:53 +0200
Subject: [Cython] Potential bug: hole in "C <-> Python" conversion
Message-ID: <20461.29857.232429.210278@localhost.localdomain>

I have

cdef extern from *:
        ctypedef char const_unsigned_char "const unsigned char"

cdef const_unsigned_char *c_data = data

leads to "Cannot convert Python object to 'const_unsigned_char *'"
while "cdef char *c_data = data" works.

Should the "ctypedef char const_unsigned_char" not ensure
that "char" and "const_unsigned_char" are used as synonyms?


--
Dieter

From stefan_ml at behnel.de  Fri Jun 29 11:42:26 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 29 Jun 2012 11:42:26 +0200
Subject: [Cython] Potential bug: hole in "C <-> Python" conversion
In-Reply-To: <20461.29857.232429.210278@localhost.localdomain>
References: <20461.29857.232429.210278@localhost.localdomain>
Message-ID: <4FED7882.1050008@behnel.de>

Dieter Maurer, 29.06.2012 11:25:
> I have
> 
> cdef extern from *:
>         ctypedef char const_unsigned_char "const unsigned char"

This is an incorrect declaration. "char" != "unsigned char".


> cdef const_unsigned_char *c_data = data
> 
> leads to "Cannot convert Python object to 'const_unsigned_char *'"
> while "cdef char *c_data = data" works.
> 
> Should the "ctypedef char const_unsigned_char" not ensure
> that "char" and "const_unsigned_char" are used as synonyms?

I assume you are not using the latest Cython (0.17pre) from github, are
you? It should have a fix for this.

Also note that libc.string contains declarations for "const char*" and friends.

Stefan

From dieter at handshake.de  Fri Jun 29 12:18:46 2012
From: dieter at handshake.de (Dieter Maurer)
Date: Fri, 29 Jun 2012 12:18:46 +0200
Subject: [Cython] Potential bug: hole in "C <-> Python" conversion
In-Reply-To: <4FED7882.1050008@behnel.de>
References: <20461.29857.232429.210278@localhost.localdomain>
	<4FED7882.1050008@behnel.de>
Message-ID: <20461.33030.605969.344092@localhost.localdomain>

Stefan Behnel wrote at 2012-6-29 11:42 +0200:
>Dieter Maurer, 29.06.2012 11:25:
>> I have
>>
>> cdef extern from *:
>>         ctypedef char const_unsigned_char "const unsigned char"
>
>This is an incorrect declaration. "char" != "unsigned char".

You are right. I cheat to get "Cython" convert between "unsigned char*"
and "bytes" in the same way as it does for "char *".

For this conversion, there is no real difference between
"char *" and "unsigned char *" (apart from a C level warning
about a pointer of a bad type passed to "PyString_FromStringAndSize").

>> cdef const_unsigned_char *c_data = data
>>
>> leads to "Cannot convert Python object to 'const_unsigned_char *'"
>> while "cdef char *c_data = data" works.
>>
>> Should the "ctypedef char const_unsigned_char" not ensure
>> that "char" and "const_unsigned_char" are used as synonyms?
>
>I assume you are not using the latest Cython (0.17pre) from github, are
>you? It should have a fix for this.

You are right.

I am using the "cython" version which comes with my operating
system ("cython 0.13").

Very good, if the latest "Cython" behaves better :-)

>Also note that libc.string contains declarations for "const char*" and friends.

Unformatunately, I need "const unsigned char*" and "const xmlChar *"
(where "xmlChar" is defined as "unsigned char").

I used the "libc.string" definitions as a blueprint for mine.


--
Dieter

From stefan_ml at behnel.de  Fri Jun 29 13:07:27 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 29 Jun 2012 13:07:27 +0200
Subject: [Cython] Potential bug: hole in "C <-> Python" conversion
In-Reply-To: <20461.33030.605969.344092@localhost.localdomain>
References: <20461.29857.232429.210278@localhost.localdomain>
	<4FED7882.1050008@behnel.de>
	<20461.33030.605969.344092@localhost.localdomain>
Message-ID: <4FED8C6F.5030204@behnel.de>

Dieter Maurer, 29.06.2012 12:18:
> Stefan Behnel wrote at 2012-6-29 11:42 +0200:
>> Also note that libc.string contains declarations for "const char*" and friends.
> 
> Unformatunately

Nice word, took me a while to make my brain split the characters correctly. ;)


> I need "const unsigned char*" and "const xmlChar *"
> (where "xmlChar" is defined as "unsigned char").

Ah, right, libxml2 - an excellent example. lxml is still suffering from the
decision of its initial author to ignore C compiler warnings ("for now")
and use plain char* instead. Lesson learned: DON'T DO THAT!

I recently started cleaning that up (which is why Cython now understands
and coerces "unsigned char*" as well), but you wouldn't believe how much
work it is to get "const" right after the fact if you have a sufficiently
large code base. The current (udiff) patch in my patch queue is some 3000
lines and still growing, but at least the compiler warnings look like
they'd soon fit on a single page. That's about the point where I need to
start tackling the really tough problems.


> I used the "libc.string" definitions as a blueprint for mine.

Sure, as long as the types are correct. lxml will have them declared in
tree.pxd at some point.

BTW, you might want to upgrade to a more recent Cython in any case. 0.13 is
almost two years old and lacks a lot of nice language features. lxml 2.4
will use Cython 0.17.

Stefan

From stefan_ml at behnel.de  Fri Jun 29 14:02:29 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 29 Jun 2012 14:02:29 +0200
Subject: [Cython] Potential bug: hole in "C <-> Python" conversion
In-Reply-To: <4FED8C6F.5030204@behnel.de>
References: <20461.29857.232429.210278@localhost.localdomain>
	<4FED7882.1050008@behnel.de>
	<20461.33030.605969.344092@localhost.localdomain>
	<4FED8C6F.5030204@behnel.de>
Message-ID: <4FED9955.1070307@behnel.de>

Stefan Behnel, 29.06.2012 13:07:
> Dieter Maurer, 29.06.2012 12:18:
>> I need "const unsigned char*" and "const xmlChar *"
>> (where "xmlChar" is defined as "unsigned char").
> 
> Ah, right, libxml2 - an excellent example. lxml is still suffering from the
> decision of its initial author to ignore C compiler warnings ("for now")
> and use plain char* instead. Lesson learned: DON'T DO THAT!

I added a doc section about using "const" with "char*".

https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/tutorial/strings.html#dealing-with-const

Stefan

From robertwb at gmail.com  Sat Jun 30 00:38:29 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Fri, 29 Jun 2012 15:38:29 -0700
Subject: [Cython] Automatic C++ conversions
In-Reply-To: <4FEC499F.5050505@behnel.de>
References: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>
	<4FEC29E9.9030609@behnel.de>
	<CADiQ+QCdq52RHSmoK-mOkEF96P7yGj25zTB+wfd+LmgzCNcNtA@mail.gmail.com>
	<4FEC499F.5050505@behnel.de>
Message-ID: <CADiQ+QBTj1M3K9NjqXVBiHeL0kR_MHmhqnAhaN2mkKbb-Ke0wA@mail.gmail.com>

On Thu, Jun 28, 2012 at 5:10 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 28.06.2012 12:07:
>> On Thu, Jun 28, 2012 at 2:54 AM, Stefan Behnel wrote:
>>> Robert Bradshaw, 28.06.2012 10:59:
>>>> I've been looking how painful it is to constantly convert between
>>>> Python objects and string in C++.
>>>
>>> You mean std::string (as I think it's called)? Can't we just special case
>>> that in the same way that we special case char* and friends?
>>
>> Yes, we could.
>
> Then I think it makes sense to do that. Basically, the std::string type
> would set its is_string flag and then we need the actual coercion code for it.

I just leveraged the object <-> char* conversion in our utility code.

>> If we do that it'd make sense to special case list and
>> vector and pair and and map and set as well, though perhaps those are
>> special enough to hard code them, and it makes the language simpler to
>> not have more special methods.
>
> Ok, then it's
>
> std::string <=> bytes
> std::vector <=> list
> std::map <=> dict
> std::set <=> set
>
> Potentially also:
>
> std::pair => tuple (maybe 2-tuple => std::pair with a runtime length test?)

I implemented

std::string <=> bytes
std::map <=> dict
iterable => std::vector => list
iterable => std::list => list
iterable => std::set => set
2-iterable => std::pair => 2-tuple

> What about allowing list(<C++ iterator>) etc.? As long as the item type can
> be coerced at compile time, this should be doable:
>
> <C++ iterator> => Python iterator
>
> and it would even be easy to implement in Cython code using a generator
> function.

The tricky part is memory management; one would have to make sure the
iterable is valid as long as the Python object is around (whereas its
usually bound to the lifetime of its container).

Even more useful, however, would be supporting the "for ... in" syntax
for C++ iterators, which I plan to implement soon if no one beats me
to it.

> The other direction (Python iterator => <C++ iterator>) would be
> trickier but could also be made to work when the C++ item type on the LHS
> of the assignment that triggers the coercion is known at compile time.

Yes, this would be actually probably be easier.

> We might want to look for a way to make these coercions a "thing" in the
> code (maybe through a registry or dedicated class) rather than adding
> special casing code everywhere.

Perhaps, but that's a rather vague idea with less immediate benefit.
The list of obvious cases to support turns out to be rather clear and
small. (We already have the from/to_py_function framework.)

> I think a CEP would be a good way to specify the above coercions.

Though user-extensibility would be a larger topic and certainly
deserve a CEP, though I'm not claiming we want to support it.

> I also think that this is large enough a feature to openly ask for sponsorship.

That depends on the CEP.

- Robert

From stefan_ml at behnel.de  Sat Jun 30 01:06:16 2012
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 30 Jun 2012 01:06:16 +0200
Subject: [Cython] Automatic C++ conversions
In-Reply-To: <CADiQ+QBTj1M3K9NjqXVBiHeL0kR_MHmhqnAhaN2mkKbb-Ke0wA@mail.gmail.com>
References: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>
	<4FEC29E9.9030609@behnel.de>
	<CADiQ+QCdq52RHSmoK-mOkEF96P7yGj25zTB+wfd+LmgzCNcNtA@mail.gmail.com>
	<4FEC499F.5050505@behnel.de>
	<CADiQ+QBTj1M3K9NjqXVBiHeL0kR_MHmhqnAhaN2mkKbb-Ke0wA@mail.gmail.com>
Message-ID: <4FEE34E8.9050807@behnel.de>

Robert Bradshaw, 30.06.2012 00:38:
> I implemented
> 
> std::string <=> bytes
> std::map <=> dict
> iterable => std::vector => list
> iterable => std::list => list
> iterable => std::set => set
> 2-iterable => std::pair => 2-tuple

Very cool.


>> What about allowing list(<C++ iterator>) etc.? As long as the item type can
>> be coerced at compile time, this should be doable:
>>
>> <C++ iterator> => Python iterator
>>
>> and it would even be easy to implement in Cython code using a generator
>> function.
> 
> The tricky part is memory management; one would have to make sure the
> iterable is valid as long as the Python object is around (whereas its
> usually bound to the lifetime of its container).

Ok, that's a problem then. We won't normally have any control over the
container. That makes for-in a much more interesting solution.


> Even more useful, however, would be supporting the "for ... in" syntax
> for C++ iterators, which I plan to implement soon if no one beats me
> to it.

Yes, that'll be a warmly appreciated feature, I guess. Please go ahead. :)


>> The other direction (Python iterator => <C++ iterator>) would be
>> trickier but could also be made to work when the C++ item type on the LHS
>> of the assignment that triggers the coercion is known at compile time.
> 
> Yes, this would be actually probably be easier.

I'm not currently sure about the details, at least the memory management
should be easy. But given that we have the container coercions now, this
might be a feature of minor interest anyway.


>> We might want to look for a way to make these coercions a "thing" in the
>> code (maybe through a registry or dedicated class) rather than adding
>> special casing code everywhere.
> 
> Perhaps, but that's a rather vague idea with less immediate benefit.
> The list of obvious cases to support turns out to be rather clear and
> small. (We already have the from/to_py_function framework.)

Right. From your code, it turned out to be substantially more local than I
thought.


>> I think a CEP would be a good way to specify the above coercions.
> 
> Though user-extensibility would be a larger topic and certainly
> deserve a CEP, though I'm not claiming we want to support it.
> 
>> I also think that this is large enough a feature to openly ask for sponsorship.
> 
> That depends on the CEP.

I think we can continue to postpone this until we actually find a use case
where it provides a substantial benefit over what we have now. Similar
feature requests have come up several times in the past, but so far, we
always got away without it.

Stefan

From robertwb at gmail.com  Sat Jun 30 01:20:21 2012
From: robertwb at gmail.com (Robert Bradshaw)
Date: Fri, 29 Jun 2012 16:20:21 -0700
Subject: [Cython] Automatic C++ conversions
In-Reply-To: <4FEE34E8.9050807@behnel.de>
References: <CADiQ+QAR7y+pxbD2_uPoKfA+r5MmHT07=DLQoTEStYLfE1x_+g@mail.gmail.com>
	<4FEC29E9.9030609@behnel.de>
	<CADiQ+QCdq52RHSmoK-mOkEF96P7yGj25zTB+wfd+LmgzCNcNtA@mail.gmail.com>
	<4FEC499F.5050505@behnel.de>
	<CADiQ+QBTj1M3K9NjqXVBiHeL0kR_MHmhqnAhaN2mkKbb-Ke0wA@mail.gmail.com>
	<4FEE34E8.9050807@behnel.de>
Message-ID: <CADiQ+QA1q=dLPUYVr_--oAAkhd8n+tFxcHM9t_Zr4bhOAUX-Og@mail.gmail.com>

On Fri, Jun 29, 2012 at 4:06 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 30.06.2012 00:38:
>> I implemented
>>
>> std::string <=> bytes
>> std::map <=> dict
>> iterable => std::vector => list
>> iterable => std::list => list
>> iterable => std::set => set
>> 2-iterable => std::pair => 2-tuple
>
> Very cool.
>
>
>>> What about allowing list(<C++ iterator>) etc.? As long as the item type can
>>> be coerced at compile time, this should be doable:
>>>
>>> <C++ iterator> => Python iterator
>>>
>>> and it would even be easy to implement in Cython code using a generator
>>> function.
>>
>> The tricky part is memory management; one would have to make sure the
>> iterable is valid as long as the Python object is around (whereas its
>> usually bound to the lifetime of its container).
>
> Ok, that's a problem then. We won't normally have any control over the
> container. That makes for-in a much more interesting solution.
>
>
>> Even more useful, however, would be supporting the "for ... in" syntax
>> for C++ iterators, which I plan to implement soon if no one beats me
>> to it.
>
> Yes, that'll be a warmly appreciated feature, I guess. Please go ahead. :)
>
>
>>> The other direction (Python iterator => <C++ iterator>) would be
>>> trickier but could also be made to work when the C++ item type on the LHS
>>> of the assignment that triggers the coercion is known at compile time.
>>
>> Yes, this would be actually probably be easier.
>
> I'm not currently sure about the details, at least the memory management
> should be easy. But given that we have the container coercions now, this
> might be a feature of minor interest anyway.
>
>
>>> We might want to look for a way to make these coercions a "thing" in the
>>> code (maybe through a registry or dedicated class) rather than adding
>>> special casing code everywhere.
>>
>> Perhaps, but that's a rather vague idea with less immediate benefit.
>> The list of obvious cases to support turns out to be rather clear and
>> small. (We already have the from/to_py_function framework.)
>
> Right. From your code, it turned out to be substantially more local than I
> thought.

And kudos to Mark for templatized cython utility code so I didn't have
to re-implement all that iterating in C.

>>> I think a CEP would be a good way to specify the above coercions.
>>
>> Though user-extensibility would be a larger topic and certainly
>> deserve a CEP, though I'm not claiming we want to support it.
>>
>>> I also think that this is large enough a feature to openly ask for sponsorship.
>>
>> That depends on the CEP.
>
> I think we can continue to postpone this until we actually find a use case
> where it provides a substantial benefit over what we have now. Similar
> feature requests have come up several times in the past, but so far, we
> always got away without it.

100% agree with you here.

- Robert

From d.s.seljebotn at astro.uio.no  Sat Jun 30 12:57:49 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 30 Jun 2012 12:57:49 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
References: <4FCD100B.7000008@astro.uio.no> <4FCFC441.40703@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no> <4FD77AAC.6080905@astro.uio.no>
	<CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
Message-ID: <4FEEDBAD.2000507@astro.uio.no>

My time is rather limited but I'm slowly trying to get another SEP 200 
in place.

Something that hit me, when I tried to make up my mind about whether to 
have (key, ptr) entries or (key, flags, ptr), is that the fast hash 
table entries can actually be arbitrary size. So we could make the table 
itself

void *table[n]

and then n would be a power of 2 (TBD: benchmark cost of allowing other 
sizes). Since we have the d[i] displacements, it's no problem at all to 
construct displacements to account for variable-size entries.

Proposal:

C-source for an un-initialized table (signature string is placeholder 
and not up for discussion now):

  { "3:method:foo:i4i4->i4", (void*)EXCEPT_STAR_FLAG, &foo_method,
    "2:numpy:SHAPE", &get_shape_method,
    "2:fieldoffset:barfield", (void*)5, 0 /*padding to n=2^k*/ }

I.e. all keys are prepended by the number of slots they use. So methods 
get to use 3 sizeof(void*) slots since they need the flags, but entries 
that don't need flags use only 2 slots. (In this case, "numpy:SHAPE" is 
a protocol defined by NumPy and so doesn't need any flags; or the flags 
are stored under "numpy:FLAGS" by that protocol.)

Then, PyExtensibleType_Ready parses this and rearranges the table to a 
perfect hash-table. As part of that, it parses the string literal keys 
and interns them, so that the number of slots becomes available in a 
more coder-friendly manner:

typedef {
     uint64_t hash; /* lower-64 bit of md5 */
     uint32_t strlen; /* we allow \0 in key */
     uint8_t nslots; /* set to 3 for first example */
     char *key; /* set to "method:foo:i4i4->i4" */
} fasttable_key_t;

Then, the interned keys for the table is the fasttable_key_t*.

Storing the hash inside the key has two pros:
  - Caching the md5 work (provided the interner uses a faster hash 
function to go from string to

Lookup would happen like this:

typedef {
     fasttable_key_t *key;
     uintptr_t flags;
     void *funcptr
} method_t;

(method_t*)PyCustomSlots_Find(mykey, mykey->hash);
/* or, faster: */
(method_t*)PyCustomSlots_Find(mykey, 0x45343453453fabaULL);

If you want to scan the table linearly (to avoid having to bother with 
getting an interned key), you would scan a table of void*, and for every 
entry cast the key to fasttable_key_t* and check nslots for how much to 
skip to get to the next entry.

Too complicated?

Dag

From d.s.seljebotn at astro.uio.no  Sat Jun 30 13:01:07 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 30 Jun 2012 13:01:07 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FEEDBAD.2000507@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no>
	<CADiQ+QDSayhOd5VjTczb2-rffMD7UZ0Bo-hqhAp7RTaycHWpKg@mail.gmail.com>
	<4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no> <4FD77AAC.6080905@astro.uio.no>
	<CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
	<4FEEDBAD.2000507@astro.uio.no>
Message-ID: <4FEEDC73.4040900@astro.uio.no>

On 06/30/2012 12:57 PM, Dag Sverre Seljebotn wrote:
> My time is rather limited but I'm slowly trying to get another SEP 200
> in place.
>
> Something that hit me, when I tried to make up my mind about whether to
> have (key, ptr) entries or (key, flags, ptr), is that the fast hash
> table entries can actually be arbitrary size. So we could make the table
> itself
>
> void *table[n]
>
> and then n would be a power of 2 (TBD: benchmark cost of allowing other
> sizes). Since we have the d[i] displacements, it's no problem at all to
> construct displacements to account for variable-size entries.
>
> Proposal:
>
> C-source for an un-initialized table (signature string is placeholder
> and not up for discussion now):
>
> { "3:method:foo:i4i4->i4", (void*)EXCEPT_STAR_FLAG, &foo_method,
> "2:numpy:SHAPE", &get_shape_method,
> "2:fieldoffset:barfield", (void*)5, 0 /*padding to n=2^k*/ }
>
> I.e. all keys are prepended by the number of slots they use. So methods
> get to use 3 sizeof(void*) slots since they need the flags, but entries
> that don't need flags use only 2 slots. (In this case, "numpy:SHAPE" is
> a protocol defined by NumPy and so doesn't need any flags; or the flags
> are stored under "numpy:FLAGS" by that protocol.)
>
> Then, PyExtensibleType_Ready parses this and rearranges the table to a
> perfect hash-table. As part of that, it parses the string literal keys
> and interns them, so that the number of slots becomes available in a
> more coder-friendly manner:
>
> typedef {
> uint64_t hash; /* lower-64 bit of md5 */
> uint32_t strlen; /* we allow \0 in key */
> uint8_t nslots; /* set to 3 for first example */
> char *key; /* set to "method:foo:i4i4->i4" */
> } fasttable_key_t;
>
> Then, the interned keys for the table is the fasttable_key_t*.
>
> Storing the hash inside the key has two pros:
> - Caching the md5 work (provided the interner uses a faster hash
> function to go from string to

Sorry.

Storing the hash inside the key has two pros:

  - Caching the md5 work (provided the interner uses a faster hash 
function to go from string to key "object")

  - You don't always have to store both key and hash in global variables 
(see below), you can dereference the key for the hash if you want to.

Dag

>
> Lookup would happen like this:
>
> typedef {
> fasttable_key_t *key;
> uintptr_t flags;
> void *funcptr
> } method_t;
>
> (method_t*)PyCustomSlots_Find(mykey, mykey->hash);
> /* or, faster: */
> (method_t*)PyCustomSlots_Find(mykey, 0x45343453453fabaULL);
>
> If you want to scan the table linearly (to avoid having to bother with
> getting an interned key), you would scan a table of void*, and for every
> entry cast the key to fasttable_key_t* and check nslots for how much to
> skip to get to the next entry.
>
> Too complicated?
>
> Dag


From d.s.seljebotn at astro.uio.no  Sat Jun 30 13:19:25 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 30 Jun 2012 13:19:25 +0200
Subject: [Cython] Hash-based vtables
In-Reply-To: <4FEEDC73.4040900@astro.uio.no>
References: <4FCD100B.7000008@astro.uio.no> <4FCFCD49.9030802@astro.uio.no>
	<CADiQ+QBg88k59h3NR78hYd7d4UBPvYrqt+RjS_wQO4Xj+704cg@mail.gmail.com>
	<4FD0808B.5080300@astro.uio.no> <4FD083F9.2030006@astro.uio.no>
	<4FD26ADA.5060401@astro.uio.no>
	<CADiQ+QAYOQ-R1=1uDkcd4p6e7kb7JQbCVphJL-RraOJ-3EgrAw@mail.gmail.com>
	<4FD2E313.6040208@astro.uio.no>
	<CADiQ+QB3iH3cDoNx8PbRFf89NxVb2wZ2AWrg0Jx9_qLcoZCMbQ@mail.gmail.com>
	<6c423841-b888-478d-8b89-148f3e9bd60e@email.android.com>
	<CADiQ+QBX1DqcX+G0XBU1RHz+eNsS0GEZLF4gi--B0ouiFwZOFw@mail.gmail.com>
	<4FD45424.9040909@astro.uio.no>
	<CADiQ+QDUoPN1EJdc9OpWLCYvLBvFRhPuRpPcEr9yarHK456k-g@mail.gmail.com>
	<4FD45E31.8060506@astro.uio.no>
	<CADiQ+QBH-PZXOzqRX-_Wi6MrByL-5KYHU9exU72770bjWsd2cQ@mail.gmail.com>
	<4FD72199.7010803@astro.uio.no> <4FD77AAC.6080905@astro.uio.no>
	<CADiQ+QD5dycL_0QOpDL3qTLn=-ymDiH-vc_mfkzitCE5JfgV2Q@mail.gmail.com>
	<4FEEDBAD.2000507@astro.uio.no> <4FEEDC73.4040900@astro.uio.no>
Message-ID: <4FEEE0BD.8080302@astro.uio.no>

On 06/30/2012 01:01 PM, Dag Sverre Seljebotn wrote:
> On 06/30/2012 12:57 PM, Dag Sverre Seljebotn wrote:
>> My time is rather limited but I'm slowly trying to get another SEP 200
>> in place.
>>
>> Something that hit me, when I tried to make up my mind about whether to
>> have (key, ptr) entries or (key, flags, ptr), is that the fast hash
>> table entries can actually be arbitrary size. So we could make the table
>> itself
>>
>> void *table[n]
>>
>> and then n would be a power of 2 (TBD: benchmark cost of allowing other
>> sizes). Since we have the d[i] displacements, it's no problem at all to
>> construct displacements to account for variable-size entries.
>>
>> Proposal:
>>
>> C-source for an un-initialized table (signature string is placeholder
>> and not up for discussion now):
>>
>> { "3:method:foo:i4i4->i4", (void*)EXCEPT_STAR_FLAG, &foo_method,
>> "2:numpy:SHAPE", &get_shape_method,
>> "2:fieldoffset:barfield", (void*)5, 0 /*padding to n=2^k*/ }
>>
>> I.e. all keys are prepended by the number of slots they use. So methods
>> get to use 3 sizeof(void*) slots since they need the flags, but entries
>> that don't need flags use only 2 slots. (In this case, "numpy:SHAPE" is
>> a protocol defined by NumPy and so doesn't need any flags; or the flags
>> are stored under "numpy:FLAGS" by that protocol.)
>>
>> Then, PyExtensibleType_Ready parses this and rearranges the table to a
>> perfect hash-table. As part of that, it parses the string literal keys
>> and interns them, so that the number of slots becomes available in a
>> more coder-friendly manner:
>>
>> typedef {
>> uint64_t hash; /* lower-64 bit of md5 */
>> uint32_t strlen; /* we allow \0 in key */
>> uint8_t nslots; /* set to 3 for first example */
>> char *key; /* set to "method:foo:i4i4->i4" */
>> } fasttable_key_t;

An idea that could be entertained is make the hash e.g. sha-256, and 
store the entire 256 bits here, so that the interning procedure didn't 
need to strcmp the entire key string on hash collisions.

I imagine that interface definitions etc. could make for rather large 
string keys, and one would also want to use the sha-256 to point to 
other interfaces.

OTOH, one needs to run through the entire key to construct the cheaper 
hash needed to go from char* to fasttable_key_t* anyway, so perhaps 
there's not much point in this, it's only a factor 2x or 3x.

Dag

>>
>> Then, the interned keys for the table is the fasttable_key_t*.
>>
>> Storing the hash inside the key has two pros:
>> - Caching the md5 work (provided the interner uses a faster hash
>> function to go from string to
>
> Sorry.
>
> Storing the hash inside the key has two pros:
>
> - Caching the md5 work (provided the interner uses a faster hash
> function to go from string to key "object")
>
> - You don't always have to store both key and hash in global variables
> (see below), you can dereference the key for the hash if you want to.
>
> Dag
>
>>
>> Lookup would happen like this:
>>
>> typedef {
>> fasttable_key_t *key;
>> uintptr_t flags;
>> void *funcptr
>> } method_t;
>>
>> (method_t*)PyCustomSlots_Find(mykey, mykey->hash);
>> /* or, faster: */
>> (method_t*)PyCustomSlots_Find(mykey, 0x45343453453fabaULL);
>>
>> If you want to scan the table linearly (to avoid having to bother with
>> getting an interned key), you would scan a table of void*, and for every
>> entry cast the key to fasttable_key_t* and check nslots for how much to
>> skip to get to the next entry.
>>
>> Too complicated?
>>
>> Dag
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel