Unable to translate locks

Hi, PyPy hackers, I am playing with Python translation to C, especially with threads. I am using start_new_thread function provided by pypy.module.thread.ll_thread. My example (see mytasks.py in attachment) works fine. But when I tried to use locks (provided by the same module), I got an exception during the translation (see mytasks_err.py for the source code and translation_trace.txt). End of the trace: [translation:ERROR] File "/home/data/opt/svnpypy/vanilla-dist/pypy/translator/c/database.py", line 156, in getcontainernode [translation:ERROR] node = nodefactory(self, T, container, **buildkwds) [translation:ERROR] File "/home/data/opt/svnpypy/vanilla-dist/pypy/translator/c/node.py", line 860, in opaquenode_factory [translation:ERROR] raise Exception("don't know about %r" % (T,)) [translation:ERROR] Exception: don't know about <struct RPyOpaque_ThreadLock (opaque)> I have no idea whats the problem, translation of pypy interpreter with thread module enabled works perfectly. Maybe it is caused by the fact that pypy interpreter uses some special annotation policy, is it possible? Please, I would be thankful for any suggestions/comments what to do to make the locks work. Marek Paška

</lurk> A while back I noticed that some math-related rpython-produced C was much slower than it should have been. After I figured out what was going on, I set it aside, but I see someone mentioned doing some numpy stuff on IRC today so I dug up the tests. Note that the following slowdown only applies to translated code, so ctypes interfaces to numpy itself are immune, and the slowdown doesn't have much effect on pypy-c, so the audience for this is probably limited. Summary: in typical cases the way pypy treats errno causes math calls in rpython to take over a third longer than they should. This can be repaired by (carefully) inlining the errno functions. Details follow. -- A simple rpython target that does nothing but loop and add the results of sin() is much slower than the same code after being translated by shedskin, which gave the same time as the equivalent cython code, which gave the same time as the equivalent handwritten C. def f(): z = 0.0 for i in xrange(100000): for j in xrange(500): z += sin(float(i+j)) return z gcc 4.2.2 (-O3 -fomit-frame-pointer): rpython: 4.160 s shedskin 0.0.28: 3.043 s cython: 3.041 s C: 3.034 s The rpython/C gap persisted with the Intel compiler: icc 10.1 (") rpython: 3.296 s C 2.118 s so clearly the rpython code was doing something that the others weren't which the compilers particularly disliked and the obvious suspect was the error handling. But that should be on the order of a few percent, not an extra third or half. Turns out the errno calls aren't being inlined -- which makes sense, they're external to the implement_*.c files -- and if you force it they're often optimized away. It's quite fragile, thanks to volatility. But replacing the start of pypy_g_ll_math_ll_math_sin with something like volatile int *errno_loc = &errno; block0: l_v591 = (long)(0L); *(errno_loc) = l_v591; l_v590 = sin(l_x_1); l_v589 = *(errno_loc); OP_INT_IS_TRUE(l_v589, l_v593); if (l_v593) { goto block2; } l_v597 = l_v590; goto block1; you get pypy_g_ll_math_ll_math_sin: .L370: pushl %ebx # subl $8, %esp #, call __errno_location # fldl 16(%esp) # l_x_1 movl %eax, %ebx #, D.11955 movl $0, (%eax) #,* D.11955 fstpl (%esp) # call sin # movl (%ebx), %eax #* D.11955, l_v589 testl %eax, %eax # l_v589 je .L372 #, which is at least much improved over the original, and unlike my first few attempts I think this one correctly survives being optimized under both gcc and icc. (Of course what's actually executed in the tests is the version of this which gets inlined into entry_point, but ll_math_sin gets inlined into entry_point in all versions, so that's not causing the difference.) And we're still calling __errno_location like we should. Anyway, now we have: gcc 4.2.2 std rpython 4.160 s gcc 4.2.2 pure C 3.034 s gcc 4.2.2 errno-inlined rpython 3.068 s icc 10.1 std rpython 3.296 s icc 10.1 pure C 2.118 s icc 10.1 errno-inlined rpython 2.244 s and that's much better, especially with gcc. Standard disclaimers apply: this worked on my linux x86 system, with the above compiler versions, under an almost new moon.. and gcc is known for dramatic variations between versions. If it works for anyone else I'd be pleasantly surprised. That said, icc agrees. Unfortunately, as you might expect, this has very modest effects on pypy-c (i.e. barely detectable over the noise, due to the overhead), so I don't know if there will be interest in modifying the C backend to change it. It does improve things considerably for rpythonic math module writers, though. FWIW. <relurk> Doug, pypy-math-sig member-in-waiting -- Queen Mary College, University of London "Still creating worlds.. Mathematical Sciences, Astronomy Unit .. but now with an accent!"

</lurk> A while back I noticed that some math-related rpython-produced C was much slower than it should have been. After I figured out what was going on, I set it aside, but I see someone mentioned doing some numpy stuff on IRC today so I dug up the tests. Note that the following slowdown only applies to translated code, so ctypes interfaces to numpy itself are immune, and the slowdown doesn't have much effect on pypy-c, so the audience for this is probably limited. Summary: in typical cases the way pypy treats errno causes math calls in rpython to take over a third longer than they should. This can be repaired by (carefully) inlining the errno functions. Details follow. -- A simple rpython target that does nothing but loop and add the results of sin() is much slower than the same code after being translated by shedskin, which gave the same time as the equivalent cython code, which gave the same time as the equivalent handwritten C. def f(): z = 0.0 for i in xrange(100000): for j in xrange(500): z += sin(float(i+j)) return z gcc 4.2.2 (-O3 -fomit-frame-pointer): rpython: 4.160 s shedskin 0.0.28: 3.043 s cython: 3.041 s C: 3.034 s The rpython/C gap persisted with the Intel compiler: icc 10.1 (") rpython: 3.296 s C 2.118 s so clearly the rpython code was doing something that the others weren't which the compilers particularly disliked and the obvious suspect was the error handling. But that should be on the order of a few percent, not an extra third or half. Turns out the errno calls aren't being inlined -- which makes sense, they're external to the implement_*.c files -- and if you force it they're often optimized away. It's quite fragile, thanks to volatility. But replacing the start of pypy_g_ll_math_ll_math_sin with something like volatile int *errno_loc = &errno; block0: l_v591 = (long)(0L); *(errno_loc) = l_v591; l_v590 = sin(l_x_1); l_v589 = *(errno_loc); OP_INT_IS_TRUE(l_v589, l_v593); if (l_v593) { goto block2; } l_v597 = l_v590; goto block1; you get pypy_g_ll_math_ll_math_sin: .L370: pushl %ebx # subl $8, %esp #, call __errno_location # fldl 16(%esp) # l_x_1 movl %eax, %ebx #, D.11955 movl $0, (%eax) #,* D.11955 fstpl (%esp) # call sin # movl (%ebx), %eax #* D.11955, l_v589 testl %eax, %eax # l_v589 je .L372 #, which is at least much improved over the original, and unlike my first few attempts I think this one correctly survives being optimized under both gcc and icc. (Of course what's actually executed in the tests is the version of this which gets inlined into entry_point, but ll_math_sin gets inlined into entry_point in all versions, so that's not causing the difference.) And we're still calling __errno_location like we should. Anyway, now we have: gcc 4.2.2 std rpython 4.160 s gcc 4.2.2 pure C 3.034 s gcc 4.2.2 errno-inlined rpython 3.068 s icc 10.1 std rpython 3.296 s icc 10.1 pure C 2.118 s icc 10.1 errno-inlined rpython 2.244 s and that's much better, especially with gcc. Standard disclaimers apply: this worked on my linux x86 system, with the above compiler versions, under an almost new moon.. and gcc is known for dramatic variations between versions. If it works for anyone else I'd be pleasantly surprised. That said, icc agrees. Unfortunately, as you might expect, this has very modest effects on pypy-c (i.e. barely detectable over the noise, due to the overhead), so I don't know if there will be interest in modifying the C backend to change it. It does improve things considerably for rpythonic math module writers, though. FWIW. <relurk> Doug, pypy-math-sig member-in-waiting -- Queen Mary College, University of London "Still creating worlds.. Mathematical Sciences, Astronomy Unit .. but now with an accent!"
participants (3)
-
Armin Rigo
-
Douglas McNeil
-
Marek Paška