[Python-Dev] C-level duck typing

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Thu May 17 20:34:24 CEST 2012


On 05/17/2012 08:13 PM, Dag Sverre Seljebotn wrote:
> Mark Shannon <mark at hotpy.org> wrote:
>> Dag Sverre Seljebotn wrote:
>>> from numpy import sin
>>> # assume sin is a Python callable and that NumPy decides to support
>>> # our spec to also support getting a "double (*sinfuncptr)(double)".
>>>
>>> # Our mission: Avoid to have the user manually import "sin" from C,
>>> # but allow just using the NumPy object and still be fast.
>>>
>>> # define a function to integrate
>>> cpdef double f(double x):
>>> return sin(x * x) # guess on signature and use "fastcall"!
>>>
>>> # the integrator
>>> def integrate(func, double a, double b, int n):
>>> cdef double s = 0
>>> cdef double dx = (b - a) / n
>>> for i in range(n):
>>> # This is also a fastcall, but can be cached so doesn't
>>> # matter...
>>> s += func(a + i * dx)
>>> return s * dx
>>>
>>> integrate(f, 0, 1, 1000000)
>>>
>>> There are two problems here:
>>>
>>> - The "sin" global can be reassigned (monkey-patched) between each
>> call
>>> to "f", no way for "f" to know. Even "sin" could do the reassignment.
>> So
>>> you'd need to check for reassignment to do caching...
>>
>> Since Cython allows static typing why not just declare that func can
>> treat sin as if it can't be monkeypatched?
>
> If you want to manually declare stuff, you can always use a C function
> pointer too...
>
>> Moving the load of a global variable out of the loop does seem to be a
>> rather obvious optimisation, if it were declared to be legal.
>
> In case you didn't notice, there was no global variable loads inside the
> loop...
>
> You can keep chasing this, but there's *always* cases where they don't
> (and you need to save the situation by manual typing).
>
> Anyway: We should really discuss Cython on the Cython list. If my
> motivating example wasn't good enough for you there's really nothing I
> can do.
>
>>> Some rough numbers:
>>>
>>> - The overhead with the tp_flags hack is a 2 ns overhead (something
>>> similar with a metaclass, the problems are more how to synchronize
>> that
>>> metaclass across multiple 3rd party libraries)
>>
>> Does your approach handle subtyping properly?
>
> Not really.
>
>>>
>>> - Dict lookup 20 ns
>>
>> Did you time _PyType_Lookup() ?
>
> No, didn't get around to it yet (and thanks for pointing it out).
> (Though the GIL requirement is an issue too for Cython.)
>
>>> - The sin function is about 35 ns. And, "f" is probably only 2-3 ns,
>>
>>> and there could very easily be multiple such functions, defined in
>>> different modules, in a chain, in order to build up a formula.
>>>
>>
>> Such micro timings are meaningless, because the working set often tends
>>
>> to fit in the hardware cache. A level 2 cache miss can takes 100s of
>> cycles.

I'm sorry; if my rant wasn't clear: Such micro-benchmarks do in fact 
mimic very closely what you'd do if you'd, say, integrate an ordinary 
differential equation. You *do* have a tight loop like that, just 
hammering on floating point numbers. Making that specific usecase more 
convenient was actually the original usecase that spawned this 
discussion on the NumPy list over a month ago...

Dag

>
> I find this sort of response arrogant -- do you know the details of
> every usecase for a programming language under the sun?
>
> Many Cython users are scientists. And in scientific computing in
> particular you *really* have the whole range of problems and working
> sets. Honestly. In some codes you only really care about the speed of
> the disk controller. In other cases you can spend *many seconds* working
> almost only in L1 or perhaps L2 cache (for instance when integrating
> ordinary differential equations in a few variables, which is not
> entirely different in nature from the example I posted). (Then, those
> many seconds are replicated many million times for different parameters
> on a large cluster, and a 2x speedup translates directly into large
> amounts of saved money.)
>
> Also, with numerical codes you block up the problem so that loads to L2
> are amortized over sufficient FLOPs (when you can).
>
> Every time Cython becomes able to do stuff more easily in this domain,
> people thank us that they didn't have to dig up Fortran but can stay
> closer to Python.
>
> Sorry for going off on a rant. I find that people will give well-meant
> advice about performance, but that advice is just generalizing from
> computer programs in entirely different domains (web apps?), and
> sweeping generalizations has a way of giving the wrong answer.
>
> Dag
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/d.s.seljebotn%40astro.uio.no
>



More information about the Python-Dev mailing list