pypy real world example, a django project data processing. but slow...

Hi, recently I asked my friends to run my sort of a benchmark on their machines (attached). The goal was to test the speed of different data access in python2 and python3, 32bit and 64bit. One of my friends sent me the pypy results -- the script ran fast as hell! Astounding. At home I have a 64bit Dell laptop running 32bit Ubuntu 14.04. I downloaded your binary https://bitbucket.org/pypy/pypy/downloads/pypy2-v5.7.0-linux32.tar.bz2 and confirmed my friend's results, wow. I develop a large Django project, that includes a big amount of background data processing. Reads large files, computes, issues much SQL to postgresql via psycopg2, every 5 minutes. Heavily uses memcache daemon between runs. I'd welcome a speedup here very much. So let's give it a try. Installed psycopg2cffi (via pip in virtualenv), set up the paths and ran. The computation printouts were the same, very promising -- taking into account how complicated the project is! The SQL looked right too. My respect on compatiblity! Unfortunately, the time needed to complete was double in comparison CPython 2.7 for exactly the same task. You mention you might have some tips for why it's slow. Are you interested in getting in touch? Although I rather can't share the code and data with you, I'm offering a real world example of significant load that might help Pypy get better. Thank you, -- : Vlada Macek : http://macek.sandbox.cz : +420 608 978 164 : UNIX && Dev || Training : Python, Django : PGP key 97330EBD (Disclaimer: The opinions expressed herein are not necessarily those of my employer, not necessarily mine, and probably not necessary.)

Hi Vlada Generally speaking, if we can't have a look there is incredibly little we can do "I have a program" can be pretty much anything. It is well known that django ORM is very slow (both on pypy and on cpython) and makes the JIT take forever to warm up. I have absolutely no idea how long is your run at full CPU, but this is definitely one of your suspects On Sun, Mar 26, 2017 at 1:06 PM, Vláďa Macek <macek@sandbox.cz> wrote:

Hi Maciej (and others?), I know I must be one of many who wanted a gain without pain. :-) Just gave it a try without having an opportunity for some deeper profiling due to my project deadlines. I just thought to get in touch in case I missed something apparent to you from the combination I reported. ORM might me slow, but I compare interpreters, not ORMs. Here's my program's final stats of processing the input file (nginx access log): CPython 2.7.6 32bit 130.1 secs, 177492 valid lines (866160 invalid), 8021 l/s, max density 72 l/s pypy2-v5.7.0-linux32 183.0 secs, 177492 valid lines (866160 invalid), 5703 l/s, max density 72 l/s This is longer run than what I tried previously and surely this is not a "double time". But still significantly slower. Each line is analyzed using a regexp, which I read is slow in pypy. Both runs have exactly same input and output. Subjectively, the processing debugging output really got faster gradually for pypy, cpython is constant speed. Is it normal that the warmup can take minutes? I don't know the details. In production, this processing is run from cron every five minutes. Is it possible to store the warmed-up state between runs? (Note: I have *.pyc files disabled at home using PYTHONDONTWRITEBYTECODE=1.) I know it's annoying I don't share code and I'm sorry. With this mail I just wanted to give out some numbers for the possibly curious. The pypy itself is interesting and I hope I'll return to it someday more thoroughly. Thanks again & have a nice day, Vláďa On 27.3.2017 17:21, Maciej Fijalkowski wrote:

What I meant is that ORM is slow *and* it takes forever to warmup. Your code might not run long enough for the ORM to be warm. It's also very likely it'll end up slower on pypy. one thing you can do is to run PYPYLOG=jit-summary:- pypy <your program> and copy paste the summary output The only way to store the warmed up state is to keep the process alive (as a daemon) and rerun it further. You can see if it speeds up after two or three runs in one process and make decisions accordingly. On Thu, Mar 30, 2017 at 2:09 PM, Vláďa Macek <macek@sandbox.cz> wrote:

Thanks! I ran it again on a much larger input and let it print the lines/sec speed on every millionth line (either valid or invalid). SPEED 6588 l/s SPEED 8208 l/s SPEED 9172 l/s SPEED 10351 l/s SPEED 16946 l/s SPEED 23263 l/s 662.6 secs, 973701 valid lines (5610778 invalid), 9937 l/s, max density 73 l/s [1c3dac321147] {jit-summary Tracing: 2794 8.313955 Backend: 2245 1.946692 TOTAL: 667.678971 ops: 5768705 recorded ops: 1478597 calls: 231321 guards: 392450 opt ops: 456372 opt guards: 101057 opt guards shared: 61039 forcings: 0 abort: trace too long: 52 abort: compiling: 0 abort: vable escape: 497 abort: bad loop: 0 abort: force quasi-immut: 0 nvirtuals: 284152 nvholes: 146657 nvreused: 90634 vecopt tried: 0 vecopt success: 0 Total # of loops: 583 Total # of bridges: 1778 Freed # of loops: 140 Freed # of bridges: 189 [1c3dac33785b] jit-summary} CPython again for comparison on the same input: SPEED 8819 l/s SPEED 9625 l/s SPEED 10285 l/s SPEED 11384 l/s SPEED 16428 l/s SPEED 20588 l/s 596.8 secs, 973701 valid lines (5610778 invalid), 11032 l/s, max density 73 l/s Interesting that after 5 million lines the PyPy speed exceeded the CPython somehow. Both runs got faster with time, probably due to my insane level of local caching of values (less SQL required). Anyway, I still hesitate whether pypy was really still warming up all that time... Thanks, Vlada On 31.3.2017 09:58, Maciej Fijalkowski wrote:
-- : Vlada Macek : http://macek.sandbox.cz : +420 608 978 164 : UNIX && Dev || Training : Python, Django : PGP key 97330EBD (Disclaimer: The opinions expressed herein are not necessarily those of my employer, not necessarily mine, and probably not necessary.)

Hi Vlada Generally speaking, if we can't have a look there is incredibly little we can do "I have a program" can be pretty much anything. It is well known that django ORM is very slow (both on pypy and on cpython) and makes the JIT take forever to warm up. I have absolutely no idea how long is your run at full CPU, but this is definitely one of your suspects On Sun, Mar 26, 2017 at 1:06 PM, Vláďa Macek <macek@sandbox.cz> wrote:

Hi Maciej (and others?), I know I must be one of many who wanted a gain without pain. :-) Just gave it a try without having an opportunity for some deeper profiling due to my project deadlines. I just thought to get in touch in case I missed something apparent to you from the combination I reported. ORM might me slow, but I compare interpreters, not ORMs. Here's my program's final stats of processing the input file (nginx access log): CPython 2.7.6 32bit 130.1 secs, 177492 valid lines (866160 invalid), 8021 l/s, max density 72 l/s pypy2-v5.7.0-linux32 183.0 secs, 177492 valid lines (866160 invalid), 5703 l/s, max density 72 l/s This is longer run than what I tried previously and surely this is not a "double time". But still significantly slower. Each line is analyzed using a regexp, which I read is slow in pypy. Both runs have exactly same input and output. Subjectively, the processing debugging output really got faster gradually for pypy, cpython is constant speed. Is it normal that the warmup can take minutes? I don't know the details. In production, this processing is run from cron every five minutes. Is it possible to store the warmed-up state between runs? (Note: I have *.pyc files disabled at home using PYTHONDONTWRITEBYTECODE=1.) I know it's annoying I don't share code and I'm sorry. With this mail I just wanted to give out some numbers for the possibly curious. The pypy itself is interesting and I hope I'll return to it someday more thoroughly. Thanks again & have a nice day, Vláďa On 27.3.2017 17:21, Maciej Fijalkowski wrote:

What I meant is that ORM is slow *and* it takes forever to warmup. Your code might not run long enough for the ORM to be warm. It's also very likely it'll end up slower on pypy. one thing you can do is to run PYPYLOG=jit-summary:- pypy <your program> and copy paste the summary output The only way to store the warmed up state is to keep the process alive (as a daemon) and rerun it further. You can see if it speeds up after two or three runs in one process and make decisions accordingly. On Thu, Mar 30, 2017 at 2:09 PM, Vláďa Macek <macek@sandbox.cz> wrote:

Thanks! I ran it again on a much larger input and let it print the lines/sec speed on every millionth line (either valid or invalid). SPEED 6588 l/s SPEED 8208 l/s SPEED 9172 l/s SPEED 10351 l/s SPEED 16946 l/s SPEED 23263 l/s 662.6 secs, 973701 valid lines (5610778 invalid), 9937 l/s, max density 73 l/s [1c3dac321147] {jit-summary Tracing: 2794 8.313955 Backend: 2245 1.946692 TOTAL: 667.678971 ops: 5768705 recorded ops: 1478597 calls: 231321 guards: 392450 opt ops: 456372 opt guards: 101057 opt guards shared: 61039 forcings: 0 abort: trace too long: 52 abort: compiling: 0 abort: vable escape: 497 abort: bad loop: 0 abort: force quasi-immut: 0 nvirtuals: 284152 nvholes: 146657 nvreused: 90634 vecopt tried: 0 vecopt success: 0 Total # of loops: 583 Total # of bridges: 1778 Freed # of loops: 140 Freed # of bridges: 189 [1c3dac33785b] jit-summary} CPython again for comparison on the same input: SPEED 8819 l/s SPEED 9625 l/s SPEED 10285 l/s SPEED 11384 l/s SPEED 16428 l/s SPEED 20588 l/s 596.8 secs, 973701 valid lines (5610778 invalid), 11032 l/s, max density 73 l/s Interesting that after 5 million lines the PyPy speed exceeded the CPython somehow. Both runs got faster with time, probably due to my insane level of local caching of values (less SQL required). Anyway, I still hesitate whether pypy was really still warming up all that time... Thanks, Vlada On 31.3.2017 09:58, Maciej Fijalkowski wrote:
-- : Vlada Macek : http://macek.sandbox.cz : +420 608 978 164 : UNIX && Dev || Training : Python, Django : PGP key 97330EBD (Disclaimer: The opinions expressed herein are not necessarily those of my employer, not necessarily mine, and probably not necessary.)
participants (2)
-
Maciej Fijalkowski
-
Vláďa Macek