[pypy-dev] pypy real world example, a django project data processing. but slow...

Thu Mar 30 08:09:55 EDT 2017

Hi Maciej (and others?),

I know I must be one of many who wanted a gain without pain. :-) Just gave
it a try without having an opportunity for some deeper profiling due to my
project deadlines. I just thought to get in touch in case I missed
something apparent to you from the combination I reported.

ORM might me slow, but I compare interpreters, not ORMs. Here's my
program's final stats of processing the input file (nginx access log):

CPython 2.7.6 32bit
130.1 secs, 177492 valid lines (866160 invalid), 8021 l/s, max density 72 l/s

pypy2-v5.7.0-linux32
183.0 secs, 177492 valid lines (866160 invalid), 5703 l/s, max density 72 l/s

This is longer run than what I tried previously and surely this is not a
"double time". But still significantly slower.

Each line is analyzed using a regexp, which I read is slow in pypy.

Both runs have exactly same input and output. Subjectively, the processing
debugging output really got faster gradually for pypy, cpython is constant
speed. Is it normal that the warmup can take minutes? I don't know the details.

In production, this processing is run from cron every five minutes. Is it
possible to store the warmed-up state between runs? (Note: I have *.pyc
files disabled at home using PYTHONDONTWRITEBYTECODE=1.)

I know it's annoying I don't share code and I'm sorry. With this mail I
just wanted to give out some numbers for the possibly curious.

The pypy itself is interesting and I hope I'll return to it someday more
thoroughly.

Thanks again & have a nice day,

Vláďa

On 27.3.2017 17:21, Maciej Fijalkowski wrote:
> Hi Vlada
>
> Generally speaking, if we can't have a look there is incredibly little
> we can do "I have a program" can be pretty much anything.
>
> It is well known that django ORM is very slow (both on pypy and on
> cpython) and makes the JIT take forever to warm up. I have absolutely
> no idea how long is your run at full CPU, but this is definitely one
> of your suspects
>
> On Sun, Mar 26, 2017 at 1:06 PM, Vláďa Macek <macek at sandbox.cz> wrote:
>> Hi, recently I asked my friends to run my sort of a benchmark on their
>> machines (attached). The goal was to test the speed of different data
>> access in python2 and python3, 32bit and 64bit. One of my friends sent me
>> the pypy results -- the script ran fast as hell! Astounding.
>>
>> At home I have a 64bit Dell laptop running 32bit Ubuntu 14.04. I downloaded
>> your binary
>> https://bitbucket.org/pypy/pypy/downloads/pypy2-v5.7.0-linux32.tar.bz2 and
>> confirmed my friend's results, wow.
>>
>> I develop a large Django project, that includes a big amount of background
>> data processing. Reads large files, computes, issues much SQL to postgresql
>> via psycopg2, every 5 minutes. Heavily uses memcache daemon between runs.
>>
>> I'd welcome a speedup here very much.
>>
>> So let's give it a try. Installed psycopg2cffi (via pip in virtualenv), set
>> up the paths and ran. The computation printouts were the same, very
>> promising -- taking into account how complicated the project is! The SQL
>> looked right too. My respect on compatiblity!
>>
>> Unfortunately, the time needed to complete was double in comparison CPython
>> 2.7 for exactly the same task.
>>
>> You mention you might have some tips for why it's slow. Are you interested
>> in getting in touch? Although I rather can't share the code and data with
>> you, I'm offering a real world example of significant load that might help
>> Pypy get better.
>>
>> Thank you,
>>
>> --
>> : Vlada Macek  :  http://macek.sandbox.cz  : +420 608 978 164
>> : UNIX && Dev || Training : Python, Django : PGP key 97330EBD
>>
>> (Disclaimer: The opinions expressed herein are not necessarily those
>> of my employer, not necessarily mine, and probably not necessary.)
>>