
On 05/30/2013 06:41 PM Nathan Hurst wrote:
On Thu, May 30, 2013 at 10:23:17AM +0200, Armin Rigo wrote:
Hi all,
Some people learn about PyPy, and the first program they try to measure speed with is something like this:
def factorial(n): res = 1 for i in range(1, n + 1): res *= i return res print factorial(25000)
It may not be completely obvious a priori, but this is as bogus as it gets. This is by now only 50% slower in PyPy than in CPython thanks to efforts from various people. The issue is of course that it's an algo which, in CPython or in PyPy, spends most of its time in C code computing with rather large "long" objects. (No, PyPy doesn't contain magic to speed up C code 10 times.) In fact, this program spends more than 2/3rd of its time in the final repr() of the result! Converting a long to base 10 is a quadratic operation.
It doesn't have to be quadratic, it's easy to come up with a splitting algorithm: def reclongtostr(x): if x< 0: return "-"+reclongtostr(-x) x = long(x) # expect a long min_digits = 9 # fits in 32 bits, there may be a better choice for this pts = [10**min_digits] while pts[-1]< x: pts.append(pts[-1]**2) pts.pop() # remove first 10**2**i greater than x output = [] def spl(x,i): if i< 0: # bottomed out with max_digit sized pieces if output or x> 0: s = str(x) output.append("0"*(min_digits - len(s)) + s) # note that this appends in inorder else: top,bot = divmod(x, pts[i]) # split the number spl(top,i-1) spl(bot,i-1) spl(x,len(pts)-1) # strip leading zeros, we can probably do this more elegantly while output[0][0] == "0": output[0] = output[0][1:] return ''.join(output)
which benchmarks factorial(25000) like this:
import time s = time.time() x = factorial(25000) print "factorial", time.time() - s sx = str(x) # give pypy a chance to compile s = time.time() sx = str(x) print "Str time", time.time() - s rsx = reclongtostr(x) # give pypy a chance to compile s = time.time() rsx = reclongtostr(x) print "my rec time", time.time() - s print "equal string:", sx == rsx
factorial 0.182402133942 Str time 0.505062818527 my rec time 0.0678248405457 equal string: True
I'm sure a better programmer than I could make this faster by avoiding saving intermediate results and various micro optimisations. But beating the builtin C implementation by a factor of 7.5 seems a reasonable outcome for pypy.
I think I could come up with a linear time two pass algorithm working on intdigits if this were important to pypy.
Does it still make sense to add programs like this to our benchmarks? So far, our benchmarks are "real-life" examples. The benchmarks like above are completely missing the point of PyPy, as they don't stress at all the Python interpreter part. There are also other cases where PyPy's performance is very bad, like cpyext on an extension module with lots of small C API calls. I believe that it would still make sense to list such cases in the official benchmark, and have the descriptions of the benchmarks explain what's wrong with them.
I agree that you should include them, I disagree that they are 'wrong'. They measure the overhead of a C call. Why should a C call be slower in pypy than cpython? Presumably it could be compiled down to the appropriate instructions and then out-perform cpy.
Now that the topic of benchmarks has come up, I came across this benchmark recently: http://dalkescientific.com/writings/diary/archive/2009/11/15/100000_tasklets...
The same benchmark took 8.5s on pypy 2beta2 and takes 7.5s on pypy 2.0.1. Is there any obvious reasons why pypy's tasklets are so slow to switch? (Is it the scheduler?) This is important for my adoption of pypy at work.
njh
[14:49 ~/wk/py/clp]$ pypy
I remember doing something similar to convert long to string, way back in .. let's see .. googling google comp.lang.python .. oy, 2001, it's been a while ;-) https://groups.google.com/forum/?hl=sv&fromgroups#!searchin/comp.lang.python/richter$20strL.py/comp.lang.python/6HYHojX7ZlA/Wizytwby71QJ (from top, search to first instance of strL.py) I guess it's not too long to post a copy here: .. (Hm, I see I should have imported time from time instead of clock, since the latter has just scheduler tick resolution, and time is high resolution.) ____________________________________________________ A couple of lines wrapped... _______________________________________________________________ # strL.py -- recursive long to decimal string conversion # 2001-07-10 bokr # p10d={} def p10(n): if not p10d.has_key(n): p = p10d[n] = 10L**n return p return p10d[n] def strLR(n,w=0): # w>0 is known width, w<0 is searching guess if w == 0: return [] if w > 0: if w <=9: return [('%0'+chr(w+48)+'d') % n] wr = w/2 nl,nr = divmod(n,p10(wr)) return strLR(nl,w-wr)+strLR(nr,wr) else: nl,nr = divmod(n,p10(-w)) if nl: return strLR(nl, 2*w) + strLR(nr,-w) else: if w >= -9: return ['%d' % n] else: return strLR(nr,w/2) def strL(n): if n<0: return ''.join(['-']+strLR(-n,-9)) else: return ''.join(strLR(n,-9)) from time import clock import sys def main(): def pr(x): print x def psl(x): s = strL(x) sys.stdout.write(s) sys.stdout.write('\n') dt={'str':str,'strL':strL,'repr':repr, 'print':pr, 'printStrL':psl } try: x=long( eval(sys.argv[2]) ) fn=sys.argv[1] fcn=dt[fn] except: sys.stderr.write("usage: %s [str strL repr print printStrL] <const expr>\n" % sys.argv[0]) sys.exit(2) t0=clock() fcn(x) t1=clock() print "%s took %9.6f seconds" % (fn,t1-t0) if __name__ == "__main__": main() ____________________________________________________ Got curious, so I added to your benchmark: from strL import strL ## bokr mod and ## bokr mod: add timing of strL, like above rsb = strL(x) # give pypy a chance to compile s = time.time() rsx = strL(x) print "strL rec time", time.time() - s print "equal string:", sx == rsb then ran it (with old python and pypy on old laptop: Version info: [14:49 ~/wk/py/clp]$ python Python 2.7.2 (default, Jul 8 2011, 23:38:53) [GCC 4.1.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required by pypy) pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by pypy) Python 2.7.1 (b590cf6de419, Apr 30 2011, 02:00:38) [PyPy 1.5.0-alpha0 with GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. And now for something completely different: ``nothing is true''
The bench run: [14:49 ~/wk/py/clp]$ python NathanHurst.py factorial 1.36516094208 Str time 2.93479013443 my rec time 1.45956683159 equal string: True strL rec time 1.34501504898 equal string: True [14:50 ~/wk/py/clp]$ pypy NathanHurst.py pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required by pypy) pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by pypy) factorial 2.29024791718 Str time 3.14243102074 my rec time 1.25054502487 equal string: True strL rec time 1.12671113014 equal string: True [14:50 ~/wk/py/clp]$ ... Hm, wonder why your factorial was slower on pypy .. prob due to old version? Regards, Bengt Richter