[pypy-dev] PyPy doesn't make code written in C faster

Fri May 31 15:01:03 CEST 2013

On 05/30/2013 06:41 PM Nathan Hurst wrote:
> On Thu, May 30, 2013 at 10:23:17AM +0200, Armin Rigo wrote:
>> Hi all,
>>
>> Some people learn about PyPy, and the first program they try to
>> measure speed with is something like this:
>>
>> def factorial(n):
>>      res = 1
>>      for i in range(1, n + 1):
>>          res *= i
>>      return res
>> print factorial(25000)
>>
>> It may not be completely obvious a priori, but this is as bogus as it
>> gets.  This is by now only 50% slower in PyPy than in CPython thanks
>> to efforts from various people.  The issue is of course that it's an
>> algo which, in CPython or in PyPy, spends most of its time in C code
>> computing with rather large "long" objects.  (No, PyPy doesn't contain
>> magic to speed up C code 10 times.)  In fact, this program spends more
>> than 2/3rd of its time in the final repr() of the result!  Converting
>> a long to base 10 is a quadratic operation.
>
> It doesn't have to be quadratic, it's easy to come up with a splitting
> algorithm:
> def reclongtostr(x):
>      if x<  0: return "-"+reclongtostr(-x)
>      x = long(x) # expect a long
>      min_digits = 9 # fits in 32 bits, there may be a better choice for
>      this
>      pts = [10**min_digits]
>      while pts[-1]<  x:
>          pts.append(pts[-1]**2)
>      pts.pop() # remove first 10**2**i greater than x
>      output = []
>      def spl(x,i):
>          if i<  0: # bottomed out with max_digit sized pieces
>              if output or x>  0:
>                  s = str(x)
>                  output.append("0"*(min_digits - len(s)) + s) # note
> 		that this appends in inorder
>          else:
>              top,bot = divmod(x, pts[i]) # split the number
>              spl(top,i-1)
>              spl(bot,i-1)
>      spl(x,len(pts)-1)
>      # strip leading zeros, we can probably do this more elegantly
>      while output[0][0] == "0":
>          output[0] = output[0][1:]
>      return ''.join(output)
>
> which benchmarks factorial(25000) like this:
>
>      import time
>      s = time.time()
>      x = factorial(25000)
>      print "factorial", time.time() - s
>      sx = str(x) # give pypy a chance to compile
>      s = time.time()
>      sx = str(x)
>      print "Str time", time.time() - s
>      rsx = reclongtostr(x) # give pypy a chance to compile
>      s = time.time()
>      rsx = reclongtostr(x)
>      print "my rec time", time.time() - s
>      print "equal string:", sx == rsx
>
> factorial 0.182402133942
> Str time 0.505062818527
> my rec time 0.0678248405457
> equal string: True
>
>
> I'm sure a better programmer than I could make this faster by avoiding
> saving intermediate results and various micro optimisations.  But
> beating the builtin C implementation by a factor of 7.5 seems a
> reasonable outcome for pypy.
>
> I think I could come up with a linear time two pass algorithm working
> on intdigits if this were important to pypy.
>
>> Does it still make sense to add programs like this to our benchmarks?
>> So far, our benchmarks are "real-life" examples.  The benchmarks like
>> above are completely missing the point of PyPy, as they don't stress
>> at all the Python interpreter part.  There are also other cases where
>> PyPy's performance is very bad, like cpyext on an extension module
>> with lots of small C API calls.  I believe that it would still make
>> sense to list such cases in the official benchmark, and have the
>> descriptions of the benchmarks explain what's wrong with them.
>
> I agree that you should include them, I disagree that they are
> 'wrong'.  They measure the overhead of a C call.  Why should a C call
> be slower in pypy than cpython?  Presumably it could be compiled down
> to the appropriate instructions and then out-perform cpy.
>
> Now that the topic of benchmarks has come up, I came across this
> benchmark recently:
> http://dalkescientific.com/writings/diary/archive/2009/11/15/100000_tasklets.html
>
> The same benchmark took 8.5s on pypy 2beta2 and takes 7.5s on pypy
> 2.0.1.  Is there any obvious reasons why pypy's tasklets are so
> slow to switch?  (Is it the scheduler?)  This is important for my
> adoption of pypy at work.
>
> njh

I remember doing something similar to convert long to string,
way back in .. let's see .. googling google comp.lang.python ..
oy, 2001, it's been a while ;-)

https://groups.google.com/forum/?hl=sv&fromgroups#!searchin/comp.lang.python/richter$20strL.py/comp.lang.python/6HYHojX7ZlA/Wizytwby71QJ

(from top, search to first instance of strL.py)

I guess it's not too long to post a copy here:

.. (Hm, I see I should have imported time from time
instead of clock, since the latter has just scheduler
tick resolution, and time is high resolution.)
____________________________________________________

A couple of lines wrapped...
_______________________________________________________________
# strL.py -- recursive long to decimal string conversion
# 2001-07-10 bokr
#
p10d={}
def p10(n):
     if not p10d.has_key(n):
         p = p10d[n] = 10L**n
         return p
     return p10d[n]

def strLR(n,w=0):   # w>0 is known width, w<0 is searching guess
     if w == 0: return []
     if w > 0:
         if w <=9: return [('%0'+chr(w+48)+'d') % n]
         wr = w/2
         nl,nr = divmod(n,p10(wr))
         return strLR(nl,w-wr)+strLR(nr,wr)
     else:
         nl,nr = divmod(n,p10(-w))
         if nl:
             return strLR(nl, 2*w) + strLR(nr,-w)
         else:
             if w >= -9:
                 return ['%d' % n]
             else:
                 return strLR(nr,w/2)

def strL(n):
     if n<0:
         return ''.join(['-']+strLR(-n,-9))
     else:
         return ''.join(strLR(n,-9))

from time import clock
import sys
def main():
     def pr(x):
         print x

     def psl(x):
         s = strL(x)
         sys.stdout.write(s)
         sys.stdout.write('\n')

     dt={'str':str,'strL':strL,'repr':repr, 'print':pr, 'printStrL':psl
}

     try:
         x=long( eval(sys.argv[2]) )
         fn=sys.argv[1]
         fcn=dt[fn]
     except:
         sys.stderr.write("usage: %s [str strL repr print printStrL]
<const expr>\n" % sys.argv[0])
         sys.exit(2)

     t0=clock()
     fcn(x)
     t1=clock()
     print "%s took %9.6f seconds" % (fn,t1-t0)

if __name__ == "__main__":
     main()

____________________________________________________

Got curious, so I added to your benchmark:

     from strL import strL   ## bokr mod
and
     ## bokr mod: add timing of strL, like above
     rsb = strL(x) # give pypy a chance to compile
     s = time.time()
     rsx = strL(x)
     print "strL rec time", time.time() - s
     print "equal string:", sx == rsb

then ran it (with old python and pypy on old laptop:
Version info:
[14:49 ~/wk/py/clp]$ python
Python 2.7.2 (default, Jul  8 2011, 23:38:53)
[GCC 4.1.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>>
[14:49 ~/wk/py/clp]$ pypy
pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required by pypy)
pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by pypy)
Python 2.7.1 (b590cf6de419, Apr 30 2011, 02:00:38)
[PyPy 1.5.0-alpha0 with GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``nothing is true''
 >>>>

The bench run:
[14:49 ~/wk/py/clp]$ python NathanHurst.py
factorial 1.36516094208
Str time 2.93479013443
my rec time 1.45956683159
equal string: True
strL rec time 1.34501504898
equal string: True
[14:50 ~/wk/py/clp]$ pypy NathanHurst.py
pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required by pypy)
pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by pypy)
factorial 2.29024791718
Str time 3.14243102074
my rec time 1.25054502487
equal string: True
strL rec time 1.12671113014
equal string: True
[14:50 ~/wk/py/clp]$

... Hm, wonder why your factorial was slower on pypy .. prob due to old version?

Regards,
Bengt Richter