Twisted's deferToThread is much more slower with pypy than on cpython
Hi, Consider this example program: https://gist.github.com/bra-fsn/1fd481b44590a939e849cb9073ba1a41 # pypy /tmp/deft.py 1 deferToThread: avg 1310.90 us, sync: avg 1.13 us, 1155.26x increase deferToThread: avg 248.42 us, sync: avg 0.19 us, 1288.78x increase deferToThread: avg 288.20 us, sync: avg 0.20 us, 1450.80x increase deferToThread: avg 366.34 us, sync: avg 0.20 us, 1860.96x increase deferToThread: avg 300.38 us, sync: avg 0.20 us, 1510.52x increase deferToThread: avg 341.94 us, sync: avg 0.20 us, 1751.63x increase deferToThread: avg 282.96 us, sync: avg 0.20 us, 1432.18x increase deferToThread: avg 232.87 us, sync: avg 0.20 us, 1185.64x increase deferToThread: avg 350.51 us, sync: avg 0.20 us, 1751.06x increase deferToThread: avg 379.09 us, sync: avg 0.20 us, 1903.13x increase While this runs, it uses only around 15% CPU. Running this on more threads yields even worse results: # pypy /tmp/deft.py 8 deferToThread: avg 8236.71 us, sync: avg 0.28 us, 29359.01x increase deferToThread: avg 8286.90 us, sync: avg 0.32 us, 26029.29x increase deferToThread: avg 8345.67 us, sync: avg 0.67 us, 12436.32x increase deferToThread: avg 8685.53 us, sync: avg 0.33 us, 26617.55x increase deferToThread: avg 8689.54 us, sync: avg 0.48 us, 18093.90x increase deferToThread: avg 8743.51 us, sync: avg 0.31 us, 28124.92x increase deferToThread: avg 8760.89 us, sync: avg 0.32 us, 27270.95x increase deferToThread: avg 8757.82 us, sync: avg 0.32 us, 27791.52x increase deferToThread: avg 5884.83 us, sync: avg 0.22 us, 26213.80x increase deferToThread: avg 5927.90 us, sync: avg 0.23 us, 25773.31x increase deferToThread: avg 6410.40 us, sync: avg 0.23 us, 27943.53x increase deferToThread: avg 6133.28 us, sync: avg 0.23 us, 26417.04x increase deferToThread: avg 6182.15 us, sync: avg 0.23 us, 26529.44x increase deferToThread: avg 6679.73 us, sync: avg 0.22 us, 30022.54x increase deferToThread: avg 6390.30 us, sync: avg 0.22 us, 28647.95x increase deferToThread: avg 6509.35 us, sync: avg 0.23 us, 28201.91x increase deferToThread: avg 6640.61 us, sync: avg 0.23 us, 28499.73x increase deferToThread: avg 6508.86 us, sync: avg 0.23 us, 28293.32x increase deferToThread: avg 6394.18 us, sync: avg 0.23 us, 27980.45x increase deferToThread: avg 6584.74 us, sync: avg 0.23 us, 28983.66x increase deferToThread: avg 6806.19 us, sync: avg 0.24 us, 28535.81x increase The process now only uses around 2% CPU. Running on even more threads: # pypy /tmp/deft.py 16 deferToThread: avg 27107.54 us, sync: avg 0.29 us, 94991.56x increase deferToThread: avg 28288.76 us, sync: avg 0.29 us, 96298.54x increase deferToThread: avg 28365.01 us, sync: avg 3.86 us, 7346.72x increase deferToThread: avg 28927.80 us, sync: avg 0.33 us, 86955.17x increase deferToThread: avg 28958.93 us, sync: avg 0.29 us, 101428.44x increase deferToThread: avg 29037.54 us, sync: avg 0.30 us, 98424.06x increase deferToThread: avg 29223.17 us, sync: avg 0.29 us, 99730.33x increase deferToThread: avg 29270.21 us, sync: avg 0.37 us, 78388.17x increase deferToThread: avg 29312.24 us, sync: avg 0.32 us, 90785.14x increase deferToThread: avg 29462.58 us, sync: avg 0.29 us, 101798.22x increase deferToThread: avg 29475.06 us, sync: avg 0.32 us, 91309.64x increase deferToThread: avg 29618.17 us, sync: avg 0.28 us, 105142.35x increase deferToThread: avg 29781.02 us, sync: avg 0.32 us, 93284.17x increase deferToThread: avg 29986.18 us, sync: avg 3.21 us, 9327.75x increase deferToThread: avg 30258.60 us, sync: avg 0.42 us, 71364.92x increase deferToThread: avg 30525.20 us, sync: avg 0.28 us, 107570.08x increase This eats between 0.2-1% CPU. And now on cypthon: # python /tmp/deft.py 1 deferToThread: avg 316.17 us, sync: avg 1.38 us, 228.71x increase deferToThread: avg 312.92 us, sync: avg 1.38 us, 226.96x increase deferToThread: avg 320.22 us, sync: avg 1.39 us, 230.37x increase deferToThread: avg 317.33 us, sync: avg 1.35 us, 235.24x increase # python /tmp/deft.py 8 deferToThread: avg 2542.90 us, sync: avg 1.37 us, 1854.14x increase deferToThread: avg 2544.50 us, sync: avg 1.35 us, 1878.13x increase deferToThread: avg 2544.47 us, sync: avg 1.36 us, 1864.52x increase deferToThread: avg 2544.52 us, sync: avg 1.38 us, 1839.01x increase deferToThread: avg 2544.92 us, sync: avg 1.36 us, 1871.81x increase deferToThread: avg 2546.71 us, sync: avg 1.39 us, 1830.35x increase deferToThread: avg 2552.38 us, sync: avg 1.35 us, 1893.17x increase deferToThread: avg 2552.40 us, sync: avg 1.36 us, 1870.20x increase # python /tmp/deft.py 16 deferToThread: avg 4745.76 us, sync: avg 1.26 us, 3770.11x increase deferToThread: avg 4748.67 us, sync: avg 1.24 us, 3817.03x increase deferToThread: avg 4749.81 us, sync: avg 1.26 us, 3756.39x increase deferToThread: avg 4749.72 us, sync: avg 1.24 us, 3839.88x increase deferToThread: avg 4749.87 us, sync: avg 1.28 us, 3709.99x increase deferToThread: avg 4752.63 us, sync: avg 1.24 us, 3842.90x increase deferToThread: avg 4752.53 us, sync: avg 1.23 us, 3866.08x increase deferToThread: avg 4752.55 us, sync: avg 1.23 us, 3855.40x increase deferToThread: avg 4754.03 us, sync: avg 1.29 us, 3678.09x increase deferToThread: avg 4754.97 us, sync: avg 1.25 us, 3817.19x increase deferToThread: avg 4755.45 us, sync: avg 1.32 us, 3593.28x increase deferToThread: avg 4756.35 us, sync: avg 1.25 us, 3804.18x increase deferToThread: avg 4756.19 us, sync: avg 1.29 us, 3687.73x increase deferToThread: avg 4757.19 us, sync: avg 1.23 us, 3860.74x increase deferToThread: avg 4758.02 us, sync: avg 1.24 us, 3824.33x increase deferToThread: avg 4759.63 us, sync: avg 1.24 us, 3830.40x increase The interpreter uses around 100% CPU. Python 2.7.11 PyPy 5.1.1 FreeBSD 10/amd64 Why deferToThread is so slow is another question, but here I'm interested in pypy's abysmal performance.
Hi Nagy, On 3 June 2016 at 09:45, Nagy, Attila <bra@fsn.hu> wrote:
Consider this example program: https://gist.github.com/bra-fsn/1fd481b44590a939e849cb9073ba1a41
I've reduced it to a minimal example and created an issue. https://bitbucket.org/pypy/pypy/issues/2341/multithreading-locks-leading-to-... Armin
Hi Nagy, On 3 June 2016 at 09:45, Nagy, Attila <bra@fsn.hu> wrote:
Consider this example program: https://gist.github.com/bra-fsn/1fd481b44590a939e849cb9073ba1a41
I think I fixed this problem in 919e00b3e558 two days ago. Now it seems to always use about 100% CPU and gets performance that is a bit better than CPython, instead of spending all its time sleeping. Yay :-) I did some tests to measure how well PyPy and CPython perform when running 2 or 3 threads in various microbenchmark-like situations (running pure Python code; acquiring and releasing the same lock; ping-pong between two threads; calling a fast or slow C function). Now in all measured cases PyPy should perform not too badly. Actually in most cases it was already better than CPython for fairness: for example, when one thread runs pure Python code and the other thread does many calls to a very fast C function, then CPython gives about 0.002% of the time(!) to the second thread and the rest to the first one. These tests can be found in https://bitbucket.org/arigo/arigo/src/default/hack/pypy-hack/gil-benchmark/ . A bientôt, Armin.
Hi, On 07/17/2016 09:48 AM, Armin Rigo wrote:
Hi Nagy,
On 3 June 2016 at 09:45, Nagy, Attila <bra@fsn.hu> wrote:
Consider this example program: https://gist.github.com/bra-fsn/1fd481b44590a939e849cb9073ba1a41 I think I fixed this problem in 919e00b3e558 two days ago. Now it seems to always use about 100% CPU and gets performance that is a bit better than CPython, instead of spending all its time sleeping. Yay :-)
I did some tests to measure how well PyPy and CPython perform when running 2 or 3 threads in various microbenchmark-like situations (running pure Python code; acquiring and releasing the same lock; ping-pong between two threads; calling a fast or slow C function). Now in all measured cases PyPy should perform not too badly. Actually in most cases it was already better than CPython for fairness: for example, when one thread runs pure Python code and the other thread does many calls to a very fast C function, then CPython gives about 0.002% of the time(!) to the second thread and the rest to the first one.
Thank you very much for taking care about this. You rock. :)
participants (2)
-
Armin Rigo
-
Nagy, Attila