Monte Carlo Method and pi
Tim Hochberg
tim.hochberg at ieee.org
Tue Jul 13 06:02:56 CEST 2004
Dan Christensen wrote:
> Tim Hochberg <tim.hochberg at ieee.org> writes:
>
>
>>I can more than double the speed of this under psyco by replacing **2
>>with x*x. I have inside information that pow is extra slow on psyco(*)
>
>
> Thanks for pointing this out. This change sped up all of the
> methods I've tried. It's too bad that Python does optimize this,
> but I case pow could be redefined at runtime.
Your welcome.
>>(*) Psyco's support for floating point operations is considerable
>>slower than its support for integer operations. The latter are
>>optimized all the way to assembly when possible, but the latter are,
>>at best, optimized into C function calls that perform the
>>operation.
>
>
> That's unfortunate.
Yes. There's still no psyco support for complex numbers at all, either.
Meaning that they run at standard Python speed. However, I expect that
to change any day now.
>>Thus there is always some function call overhead. Fixing
>>this would require someone who groks both the guts of Psyco and the
>>intracies of x86 floating point machine code to step forward.
>
>
> If someone volunteers, that would be great. I don't know anything
> about x86 floating point; is it a real mess?
From what I understand yes, but I haven't looked at assembly, let alone
machine code for many, many years and I was never particularly adept at
doing anything with it.
>
>>If one takes into the account the speed difference of the two CPUs this
>>puts the both the numarray and psyco solutions within about 50% of the
>>Fortran solution, which I'm impressed by.
>
>
> I've now run a bunch of timings of all the methods that have been
> proposed (except that I used C instead of Fortran) on one machine,
> a 2.66GHz P4. Here are the results, in seconds:
>
> Python 2.3.4 Python 2.4a1
>
> naive loop: 0.657765 0.589741
> naive loop psyco: 0.159085
> naive loop weave: 0.084898 *
> numarray: 0.115775 **
> imap lots: 1.359815 0.994505
> imap fn: 0.979589 0.758308
> custom gen: 0.841540 0.681277
> gen expr: 0.694567
>
> naive loop C: 0.040 *
>
> Only one of the tests used psyco. I did run the others with psyco
> too, but the times were about the same or slower.
FYI, generators are something that is not currently supported by Psyco.
For a more complete list, see
http://psyco.sourceforge.net/psycoguide/unsupported.html. If you
bug^H^H^H, plead with Armin, maybe he'll implement them.
> For me, psyco was about four times slower than C, and numarray was
> almost 3 times slower. I'm surprised by how close psyco and numarray
> were in your runs, and how slow Fortran was in beliavsky's test.
> My C program measures user+sys cpu time for the loop itself, but
> not start-up time for the program. In any case, getting within
> a factor of four of C, with a random number generator that is probably
> better, is pretty good!
I'm using a somewhat old version of numarray, which could be part of it.
>
> One really should compare the random number generators more carefully,
> since they could take a significant fraction of the time. The lines
> with one * use C's rand(). The line with ** uses numarray's random
> array function. Does anyone know what random number generator is used
> by numarray?
>
> By the way, note how well Python 2.4 performs compared with 2.3.4.
> (Psyco, weave, numarray not shown because I don't have them for 2.4.)
>
> I'm still curious about whether it could be possible to get
> really fast loops in Python using iterators and expressions like
> sum(1 + it1 - 2 * it2), where it1 and it2 are iterators that produce
> numbers. Could Python be clever enough to implement that as a
> for loop in C with just two or three C function calls in the loop?
That would require new syntax, would it not? That seems unlikely. It
seems that in 2.4, sum(1+i1-2*i2 for (i1, i2) in izip(i1, i2)) would
work. It also seems that with sufficient work psyco could be made to
optimize that in the manner you suggest. I wouldn't hold your breath though.
-tim
>
> Dan
>
> Here's the code I used:
>
> -----pi.py-----
> from random import random
> from itertools import *
> from math import sqrt
> from operator import pow, sub, add
>
> from timeTest import *
>
> try:
> import weave
> haveWeave = True
> except:
> haveWeave = False
>
> try:
> import numarray
> import numarray.random_array
> haveNumarray = True
> except:
> haveNumarray = False
>
> def v1(n = 500000):
> "naive loop"
> rand = random
> sqr = sqrt
> sm = 0
> for i in range(n):
> r = rand()
> sm += sqr(1-r*r)
> return 4*sm/n
>
> def v1weave(n = 500000):
> "naive loop weave"
>
> support = "#include <stdlib.h>"
>
> code = """
> double sm;
> float rnd;
> srand(1); // seed random number generator
> sm = 0.0;
> for(int i=0;i<n;++i) {
> rnd = rand()/(RAND_MAX+1.0);
> sm += sqrt(1.0-rnd*rnd);
> }
> return_val = 4.0*sm/n;"""
> return weave.inline(code,('n'),support_code=support)
>
> if(haveNumarray):
> def v2(n = 500000):
> "numarray"
> a = numarray.random_array.uniform(0, 1, n)
> return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n
>
> def v3(n = 500000):
> "imap lots"
> rand = random
> pw = pow
> sqr = sqrt
> sq = lambda x: x*x
> sm = sum(imap(sqr,
> imap(sub, repeat(1),
> imap(sq,
> imap(lambda dummy: rand(), repeat(None, n))))))
> return 4*sm/n
>
> def v4(n = 500000):
> "imap fn"
> rand = random
> pw = pow
> sqr = sqrt
>
> def fn(dummy):
> r = rand()
> return sqr(1-r*r)
>
> sm = sum(imap(fn, repeat(None, n)))
> return 4*sm/n
>
> def custom(n):
> rand = random
> for i in xrange(n):
> yield(sqrt(1-rand()**2))
>
> def v5(n = 500000):
> "custom gen"
> sm = sum(custom(n))
> return 4*sm/n
>
> # The commented out psyco runs don't show significant improvement
> # (or are slower).
>
> if haveWeave:
> v1weave()
>
> timeTest(v1)
> timeTestWithPsyco(v1)
> if haveWeave:
> timeTest(v1weave)
> if haveNumarray:
> timeTest(v2)
> # timeTestWithPsyco(v2)
> timeTest(v3)
> #timeTestWithPsyco(v3)
> timeTest(v4)
> #timeTestWithPsyco(v4)
> timeTest(v5)
> #psyco.bind(custom)
> #timeTestWithPsyco(v5)
>
> -----pi2.py----- (only for Python2.4)
> from random import random
> from math import sqrt
>
> from timeTest import *
>
> # Even if this is not executed, it is parsed, so Python 2.3 complains.
>
> def v6(n = 500000):
> "gen expr"
> rand = random
> sqr = sqrt
> sm = sum(sqr(1-rand()**2) for i in xrange(n))
> return 4*sm/n
>
> timeTest(v6)
>
> -----timeTest.py-----
> import time
>
> try:
> import psyco
> havePsyco = True
> except:
> havePsyco = False
>
> def timeTest(f):
> s = time.time()
> f()
> print "%18s: %f" % (f.__doc__, time.time()-s)
>
> def timeTestWithPsyco(f):
> if havePsyco:
> g = psyco.proxy(f)
> g.__doc__ = f.__doc__ + " psyco"
> g()
> timeTest(g)
>
> -----pi.c-----
> #include <stdlib.h>
>
> #include "/home/spin/C/spin_time.h"
>
> int main()
> {
> double sm;
> float rnd;
> int i;
> int n = 500000;
> spin_time start, end;
>
> spin_time_set(start);
> srand(1); // seed random number generator
> sm = 0.0;
>
> for(i=0;i<n;++i) {
> rnd = rand()/(RAND_MAX+1.0);
> sm += sqrt(1.0-rnd*rnd);
> }
> spin_time_set(end);
> printf("%f\n", 4.0*sm/n);
> printf("%18s: %.3f\n", "naive loop C", spin_time_both(start, end));
> }
>
> -----spin_time.h-----
> #include <time.h>
> #include <sys/times.h>
> #include <stdio.h>
>
> #define FACTOR (1.0/100.0)
>
> typedef struct {
> clock_t real;
> struct tms t;
> } spin_time;
>
> #define spin_time_set(st) st.real = times(&(st.t))
>
> // floating point number of seconds:
> #define spin_time_both(st_start, st_end) \
> ((st_end.t.tms_utime-st_start.t.tms_utime)*FACTOR + \
> (st_end.t.tms_stime-st_start.t.tms_stime)*FACTOR)
More information about the Python-list
mailing list