On Thu, May 30, 2013 at 10:23:17AM +0200, Armin Rigo wrote:

Hi all,

Some people learn about PyPy, and the first program they try to measure speed with is something like this:

def factorial(n): res = 1 for i in range(1, n + 1): res *= i return res print factorial(25000)

It may not be completely obvious a priori, but this is as bogus as it gets. This is by now only 50% slower in PyPy than in CPython thanks to efforts from various people. The issue is of course that it's an algo which, in CPython or in PyPy, spends most of its time in C code computing with rather large "long" objects. (No, PyPy doesn't contain magic to speed up C code 10 times.) In fact, this program spends more than 2/3rd of its time in the final repr() of the result! Converting a long to base 10 is a quadratic operation.

It doesn't have to be quadratic, it's easy to come up with a splitting algorithm: def reclongtostr(x): if x < 0: return "-"+reclongtostr(-x) x = long(x) # expect a long min_digits = 9 # fits in 32 bits, there may be a better choice for this pts = [10**min_digits] while pts[-1] < x: pts.append(pts[-1]**2) pts.pop() # remove first 10**2**i greater than x output = [] def spl(x,i): if i < 0: # bottomed out with max_digit sized pieces if output or x > 0: s = str(x) output.append("0"*(min_digits - len(s)) + s) # note that this appends in inorder else: top,bot = divmod(x, pts[i]) # split the number spl(top,i-1) spl(bot,i-1) spl(x,len(pts)-1) # strip leading zeros, we can probably do this more elegantly while output[0][0] == "0": output[0] = output[0][1:] return ''.join(output) which benchmarks factorial(25000) like this: import time s = time.time() x = factorial(25000) print "factorial", time.time() - s sx = str(x) # give pypy a chance to compile s = time.time() sx = str(x) print "Str time", time.time() - s rsx = reclongtostr(x) # give pypy a chance to compile s = time.time() rsx = reclongtostr(x) print "my rec time", time.time() - s print "equal string:", sx == rsx factorial 0.182402133942 Str time 0.505062818527 my rec time 0.0678248405457 equal string: True I'm sure a better programmer than I could make this faster by avoiding saving intermediate results and various micro optimisations. But beating the builtin C implementation by a factor of 7.5 seems a reasonable outcome for pypy. I think I could come up with a linear time two pass algorithm working on intdigits if this were important to pypy.

Does it still make sense to add programs like this to our benchmarks? So far, our benchmarks are "real-life" examples. The benchmarks like above are completely missing the point of PyPy, as they don't stress at all the Python interpreter part. There are also other cases where PyPy's performance is very bad, like cpyext on an extension module with lots of small C API calls. I believe that it would still make sense to list such cases in the official benchmark, and have the descriptions of the benchmarks explain what's wrong with them.

I agree that you should include them, I disagree that they are 'wrong'. They measure the overhead of a C call. Why should a C call be slower in pypy than cpython? Presumably it could be compiled down to the appropriate instructions and then out-perform cpy. Now that the topic of benchmarks has come up, I came across this benchmark recently: http://dalkescientific.com/writings/diary/archive/2009/11/15/100000_tasklets... The same benchmark took 8.5s on pypy 2beta2 and takes 7.5s on pypy 2.0.1. Is there any obvious reasons why pypy's tasklets are so slow to switch? (Is it the scheduler?) This is important for my adoption of pypy at work. njh