[pypy-dev] pypy real world example, a django project data processing. but slow...

Maciej Fijalkowski fijall at gmail.com
Fri Mar 31 03:58:19 EDT 2017


What I meant is that ORM is slow *and* it takes forever to warmup.
Your code might not run long enough for the ORM to be warm. It's also
very likely it'll end up slower on pypy. one thing you can do is to
run PYPYLOG=jit-summary:- pypy <your program> and copy paste the
summary output

The only way to store the warmed up state is to keep the process alive
(as a daemon) and rerun it further. You can see if it speeds up after
two or three runs in one process and make decisions accordingly.

On Thu, Mar 30, 2017 at 2:09 PM, Vláďa Macek <macek at sandbox.cz> wrote:
> Hi Maciej (and others?),
>
> I know I must be one of many who wanted a gain without pain. :-) Just gave
> it a try without having an opportunity for some deeper profiling due to my
> project deadlines. I just thought to get in touch in case I missed
> something apparent to you from the combination I reported.
>
> ORM might me slow, but I compare interpreters, not ORMs. Here's my
> program's final stats of processing the input file (nginx access log):
>
> CPython 2.7.6 32bit
> 130.1 secs, 177492 valid lines (866160 invalid), 8021 l/s, max density 72 l/s
>
> pypy2-v5.7.0-linux32
> 183.0 secs, 177492 valid lines (866160 invalid), 5703 l/s, max density 72 l/s
>
> This is longer run than what I tried previously and surely this is not a
> "double time". But still significantly slower.
>
> Each line is analyzed using a regexp, which I read is slow in pypy.
>
> Both runs have exactly same input and output. Subjectively, the processing
> debugging output really got faster gradually for pypy, cpython is constant
> speed. Is it normal that the warmup can take minutes? I don't know the details.
>
> In production, this processing is run from cron every five minutes. Is it
> possible to store the warmed-up state between runs? (Note: I have *.pyc
> files disabled at home using PYTHONDONTWRITEBYTECODE=1.)
>
> I know it's annoying I don't share code and I'm sorry. With this mail I
> just wanted to give out some numbers for the possibly curious.
>
> The pypy itself is interesting and I hope I'll return to it someday more
> thoroughly.
>
> Thanks again & have a nice day,
>
> Vláďa
>
>
> On 27.3.2017 17:21, Maciej Fijalkowski wrote:
>> Hi Vlada
>>
>> Generally speaking, if we can't have a look there is incredibly little
>> we can do "I have a program" can be pretty much anything.
>>
>> It is well known that django ORM is very slow (both on pypy and on
>> cpython) and makes the JIT take forever to warm up. I have absolutely
>> no idea how long is your run at full CPU, but this is definitely one
>> of your suspects
>>
>> On Sun, Mar 26, 2017 at 1:06 PM, Vláďa Macek <macek at sandbox.cz> wrote:
>>> Hi, recently I asked my friends to run my sort of a benchmark on their
>>> machines (attached). The goal was to test the speed of different data
>>> access in python2 and python3, 32bit and 64bit. One of my friends sent me
>>> the pypy results -- the script ran fast as hell! Astounding.
>>>
>>> At home I have a 64bit Dell laptop running 32bit Ubuntu 14.04. I downloaded
>>> your binary
>>> https://bitbucket.org/pypy/pypy/downloads/pypy2-v5.7.0-linux32.tar.bz2 and
>>> confirmed my friend's results, wow.
>>>
>>> I develop a large Django project, that includes a big amount of background
>>> data processing. Reads large files, computes, issues much SQL to postgresql
>>> via psycopg2, every 5 minutes. Heavily uses memcache daemon between runs.
>>>
>>> I'd welcome a speedup here very much.
>>>
>>> So let's give it a try. Installed psycopg2cffi (via pip in virtualenv), set
>>> up the paths and ran. The computation printouts were the same, very
>>> promising -- taking into account how complicated the project is! The SQL
>>> looked right too. My respect on compatiblity!
>>>
>>> Unfortunately, the time needed to complete was double in comparison CPython
>>> 2.7 for exactly the same task.
>>>
>>> You mention you might have some tips for why it's slow. Are you interested
>>> in getting in touch? Although I rather can't share the code and data with
>>> you, I'm offering a real world example of significant load that might help
>>> Pypy get better.
>>>
>>> Thank you,
>>>
>>> --
>>> : Vlada Macek  :  http://macek.sandbox.cz  : +420 608 978 164
>>> : UNIX && Dev || Training : Python, Django : PGP key 97330EBD
>>>
>>> (Disclaimer: The opinions expressed herein are not necessarily those
>>> of my employer, not necessarily mine, and probably not necessary.)
>>>
>


More information about the pypy-dev mailing list