[pypy-dev] PyPy doesn't make code written in C faster

Thu May 30 18:41:02 CEST 2013

On Thu, May 30, 2013 at 10:23:17AM +0200, Armin Rigo wrote:
> Hi all,
> 
> Some people learn about PyPy, and the first program they try to
> measure speed with is something like this:
> 
> def factorial(n):
>     res = 1
>     for i in range(1, n + 1):
>         res *= i
>     return res
> print factorial(25000)
> 
> It may not be completely obvious a priori, but this is as bogus as it
> gets.  This is by now only 50% slower in PyPy than in CPython thanks
> to efforts from various people.  The issue is of course that it's an
> algo which, in CPython or in PyPy, spends most of its time in C code
> computing with rather large "long" objects.  (No, PyPy doesn't contain
> magic to speed up C code 10 times.)  In fact, this program spends more
> than 2/3rd of its time in the final repr() of the result!  Converting
> a long to base 10 is a quadratic operation.

It doesn't have to be quadratic, it's easy to come up with a splitting
algorithm:
def reclongtostr(x):
    if x < 0: return "-"+reclongtostr(-x)
    x = long(x) # expect a long
    min_digits = 9 # fits in 32 bits, there may be a better choice for
    this
    pts = [10**min_digits]
    while pts[-1] < x:
        pts.append(pts[-1]**2)
    pts.pop() # remove first 10**2**i greater than x
    output = []
    def spl(x,i):
        if i < 0: # bottomed out with max_digit sized pieces
            if output or x > 0:
                s = str(x)
                output.append("0"*(min_digits - len(s)) + s) # note
		that this appends in inorder
        else:
            top,bot = divmod(x, pts[i]) # split the number
            spl(top,i-1)
            spl(bot,i-1)
    spl(x,len(pts)-1)
    # strip leading zeros, we can probably do this more elegantly
    while output[0][0] == "0":
        output[0] = output[0][1:]
    return ''.join(output)

which benchmarks factorial(25000) like this:

    import time
    s = time.time()
    x = factorial(25000)
    print "factorial", time.time() - s
    sx = str(x) # give pypy a chance to compile
    s = time.time()
    sx = str(x)
    print "Str time", time.time() - s
    rsx = reclongtostr(x) # give pypy a chance to compile
    s = time.time()
    rsx = reclongtostr(x)
    print "my rec time", time.time() - s
    print "equal string:", sx == rsx

factorial 0.182402133942
Str time 0.505062818527
my rec time 0.0678248405457
equal string: True

I'm sure a better programmer than I could make this faster by avoiding
saving intermediate results and various micro optimisations.  But
beating the builtin C implementation by a factor of 7.5 seems a
reasonable outcome for pypy.

I think I could come up with a linear time two pass algorithm working
on intdigits if this were important to pypy.

> Does it still make sense to add programs like this to our benchmarks?
> So far, our benchmarks are "real-life" examples.  The benchmarks like
> above are completely missing the point of PyPy, as they don't stress
> at all the Python interpreter part.  There are also other cases where
> PyPy's performance is very bad, like cpyext on an extension module
> with lots of small C API calls.  I believe that it would still make
> sense to list such cases in the official benchmark, and have the
> descriptions of the benchmarks explain what's wrong with them.

I agree that you should include them, I disagree that they are
'wrong'.  They measure the overhead of a C call.  Why should a C call
be slower in pypy than cpython?  Presumably it could be compiled down
to the appropriate instructions and then out-perform cpy.

Now that the topic of benchmarks has come up, I came across this
benchmark recently:
http://dalkescientific.com/writings/diary/archive/2009/11/15/100000_tasklets.html

The same benchmark took 8.5s on pypy 2beta2 and takes 7.5s on pypy
2.0.1.  Is there any obvious reasons why pypy's tasklets are so
slow to switch?  (Is it the scheduler?)  This is important for my
adoption of pypy at work.

njh