[pypy-dev] math calls and errno
Douglas McNeil
d.mcneil at qmul.ac.uk
Wed Jul 30 00:51:59 CEST 2008
</lurk>
A while back I noticed that some math-related rpython-produced C was much
slower than it should have been. After I figured out what was going on, I
set it aside, but I see someone mentioned doing some numpy stuff on IRC
today so I dug up the tests.
Note that the following slowdown only applies to translated code, so
ctypes interfaces to numpy itself are immune, and the slowdown doesn't
have much effect on pypy-c, so the audience for this is probably limited.
Summary: in typical cases the way pypy treats errno causes math calls in
rpython to take over a third longer than they should. This can be
repaired by (carefully) inlining the errno functions.
Details follow.
--
A simple rpython target that does nothing but loop and add the results of
sin() is much slower than the same code after being translated by
shedskin, which gave the same time as the equivalent cython code, which
gave the same time as the equivalent handwritten C.
def f():
z = 0.0
for i in xrange(100000):
for j in xrange(500):
z += sin(float(i+j))
return z
gcc 4.2.2 (-O3 -fomit-frame-pointer):
rpython: 4.160 s
shedskin 0.0.28: 3.043 s
cython: 3.041 s
C: 3.034 s
The rpython/C gap persisted with the Intel compiler:
icc 10.1 (")
rpython: 3.296 s
C 2.118 s
so clearly the rpython code was doing something that the others weren't
which the compilers particularly disliked and the obvious suspect was the
error handling. But that should be on the order of a few percent, not an
extra third or half.
Turns out the errno calls aren't being inlined -- which makes sense,
they're external to the implement_*.c files -- and if you force it they're
often optimized away. It's quite fragile, thanks to volatility. But
replacing the start of pypy_g_ll_math_ll_math_sin with something like
volatile int *errno_loc = &errno;
block0:
l_v591 = (long)(0L);
*(errno_loc) = l_v591;
l_v590 = sin(l_x_1);
l_v589 = *(errno_loc);
OP_INT_IS_TRUE(l_v589, l_v593);
if (l_v593) {
goto block2;
}
l_v597 = l_v590;
goto block1;
you get
pypy_g_ll_math_ll_math_sin:
.L370:
pushl %ebx #
subl $8, %esp #,
call __errno_location #
fldl 16(%esp) # l_x_1
movl %eax, %ebx #, D.11955
movl $0, (%eax) #,* D.11955
fstpl (%esp) #
call sin #
movl (%ebx), %eax #* D.11955, l_v589
testl %eax, %eax # l_v589
je .L372 #,
which is at least much improved over the original, and unlike my first few
attempts I think this one correctly survives being optimized under both
gcc and icc. (Of course what's actually executed in the tests is the
version of this which gets inlined into entry_point, but ll_math_sin gets
inlined into entry_point in all versions, so that's not causing the
difference.) And we're still calling __errno_location like we should.
Anyway, now we have:
gcc 4.2.2 std rpython 4.160 s
gcc 4.2.2 pure C 3.034 s
gcc 4.2.2 errno-inlined rpython 3.068 s
icc 10.1 std rpython 3.296 s
icc 10.1 pure C 2.118 s
icc 10.1 errno-inlined rpython 2.244 s
and that's much better, especially with gcc.
Standard disclaimers apply: this worked on my linux x86 system, with the
above compiler versions, under an almost new moon.. and gcc is known for
dramatic variations between versions. If it works for anyone else I'd be
pleasantly surprised. That said, icc agrees.
Unfortunately, as you might expect, this has very modest effects on pypy-c
(i.e. barely detectable over the noise, due to the overhead), so I don't
know if there will be interest in modifying the C backend to change it.
It does improve things considerably for rpythonic math module writers,
though. FWIW.
<relurk>
Doug, pypy-math-sig member-in-waiting
--
Queen Mary College, University of London "Still creating worlds..
Mathematical Sciences, Astronomy Unit .. but now with an accent!"
More information about the Pypy-dev
mailing list