[pypy-dev] pypy real world example, a django project data processing. but slow...

Vláďa Macek macek at sandbox.cz
Fri Mar 31 07:19:31 EDT 2017


Thanks! I ran it again on a much larger input and let it print the
lines/sec speed on every millionth line (either valid or invalid).

SPEED 6588 l/s
SPEED 8208 l/s
SPEED 9172 l/s
SPEED 10351 l/s
SPEED 16946 l/s
SPEED 23263 l/s

662.6 secs, 973701 valid lines (5610778 invalid), 9937 l/s, max density 73 l/s

[1c3dac321147] {jit-summary
Tracing:          2794    8.313955
Backend:          2245    1.946692
TOTAL:              667.678971
ops:                 5768705
recorded ops:        1478597
  calls:             231321
guards:              392450
opt ops:             456372
opt guards:          101057
opt guards shared:    61039
forcings:            0
abort: trace too long:    52
abort: compiling:    0
abort: vable escape:    497
abort: bad loop:     0
abort: force quasi-immut:    0
nvirtuals:           284152
nvholes:             146657
nvreused:            90634
vecopt tried:        0
vecopt success:      0
Total # of loops:    583
Total # of bridges:    1778
Freed # of loops:    140
Freed # of bridges:    189
[1c3dac33785b] jit-summary}

CPython again for comparison on the same input:

SPEED 8819 l/s
SPEED 9625 l/s
SPEED 10285 l/s
SPEED 11384 l/s
SPEED 16428 l/s
SPEED 20588 l/s

596.8 secs, 973701 valid lines (5610778 invalid), 11032 l/s, max density 73 l/s

Interesting that after 5 million lines the PyPy speed exceeded the CPython
somehow. Both runs got faster with time, probably due to my insane level of
local caching of values (less SQL required).

Anyway, I still hesitate whether pypy was really still warming up all that
time...

Thanks,

Vlada


On 31.3.2017 09:58, Maciej Fijalkowski wrote:
> What I meant is that ORM is slow *and* it takes forever to warmup.
> Your code might not run long enough for the ORM to be warm. It's also
> very likely it'll end up slower on pypy. one thing you can do is to
> run PYPYLOG=jit-summary:- pypy <your program> and copy paste the
> summary output
>
> The only way to store the warmed up state is to keep the process alive
> (as a daemon) and rerun it further. You can see if it speeds up after
> two or three runs in one process and make decisions accordingly.
>
> On Thu, Mar 30, 2017 at 2:09 PM, Vláďa Macek <macek at sandbox.cz> wrote:
>> Hi Maciej (and others?),
>>
>> I know I must be one of many who wanted a gain without pain. :-) Just gave
>> it a try without having an opportunity for some deeper profiling due to my
>> project deadlines. I just thought to get in touch in case I missed
>> something apparent to you from the combination I reported.
>>
>> ORM might me slow, but I compare interpreters, not ORMs. Here's my
>> program's final stats of processing the input file (nginx access log):
>>
>> CPython 2.7.6 32bit
>> 130.1 secs, 177492 valid lines (866160 invalid), 8021 l/s, max density 72 l/s
>>
>> pypy2-v5.7.0-linux32
>> 183.0 secs, 177492 valid lines (866160 invalid), 5703 l/s, max density 72 l/s
>>
>> This is longer run than what I tried previously and surely this is not a
>> "double time". But still significantly slower.
>>
>> Each line is analyzed using a regexp, which I read is slow in pypy.
>>
>> Both runs have exactly same input and output. Subjectively, the processing
>> debugging output really got faster gradually for pypy, cpython is constant
>> speed. Is it normal that the warmup can take minutes? I don't know the details.
>>
>> In production, this processing is run from cron every five minutes. Is it
>> possible to store the warmed-up state between runs? (Note: I have *.pyc
>> files disabled at home using PYTHONDONTWRITEBYTECODE=1.)
>>
>> I know it's annoying I don't share code and I'm sorry. With this mail I
>> just wanted to give out some numbers for the possibly curious.
>>
>> The pypy itself is interesting and I hope I'll return to it someday more
>> thoroughly.
>>
>> Thanks again & have a nice day,
>>
>> Vláďa
>>
>>
>> On 27.3.2017 17:21, Maciej Fijalkowski wrote:
>>> Hi Vlada
>>>
>>> Generally speaking, if we can't have a look there is incredibly little
>>> we can do "I have a program" can be pretty much anything.
>>>
>>> It is well known that django ORM is very slow (both on pypy and on
>>> cpython) and makes the JIT take forever to warm up. I have absolutely
>>> no idea how long is your run at full CPU, but this is definitely one
>>> of your suspects
>>>
>>> On Sun, Mar 26, 2017 at 1:06 PM, Vláďa Macek <macek at sandbox.cz> wrote:
>>>> Hi, recently I asked my friends to run my sort of a benchmark on their
>>>> machines (attached). The goal was to test the speed of different data
>>>> access in python2 and python3, 32bit and 64bit. One of my friends sent me
>>>> the pypy results -- the script ran fast as hell! Astounding.
>>>>
>>>> At home I have a 64bit Dell laptop running 32bit Ubuntu 14.04. I downloaded
>>>> your binary
>>>> https://bitbucket.org/pypy/pypy/downloads/pypy2-v5.7.0-linux32.tar.bz2 and
>>>> confirmed my friend's results, wow.
>>>>
>>>> I develop a large Django project, that includes a big amount of background
>>>> data processing. Reads large files, computes, issues much SQL to postgresql
>>>> via psycopg2, every 5 minutes. Heavily uses memcache daemon between runs.
>>>>
>>>> I'd welcome a speedup here very much.
>>>>
>>>> So let's give it a try. Installed psycopg2cffi (via pip in virtualenv), set
>>>> up the paths and ran. The computation printouts were the same, very
>>>> promising -- taking into account how complicated the project is! The SQL
>>>> looked right too. My respect on compatiblity!
>>>>
>>>> Unfortunately, the time needed to complete was double in comparison CPython
>>>> 2.7 for exactly the same task.
>>>>
>>>> You mention you might have some tips for why it's slow. Are you interested
>>>> in getting in touch? Although I rather can't share the code and data with
>>>> you, I'm offering a real world example of significant load that might help
>>>> Pypy get better.
>>>>
>>>> Thank you,
>>>>
>>>> --
>>>> : Vlada Macek  :  http://macek.sandbox.cz  : +420 608 978 164
>>>> : UNIX && Dev || Training : Python, Django : PGP key 97330EBD
>>>>
>>>> (Disclaimer: The opinions expressed herein are not necessarily those
>>>> of my employer, not necessarily mine, and probably not necessary.)
>>>>


-- 
: Vlada Macek  :  http://macek.sandbox.cz  : +420 608 978 164
: UNIX && Dev || Training : Python, Django : PGP key 97330EBD

(Disclaimer: The opinions expressed herein are not necessarily those
of my employer, not necessarily mine, and probably not necessary.)



More information about the pypy-dev mailing list