[pypy-dev] which xml libraries? was (Re: PyPy 1.4 released)

Paolo Giarrusso p.giarrusso at gmail.com
Tue Nov 30 00:14:08 CET 2010


Hi all,
thanks to the tips, I verified on Mac OS X a 17% slowdown, after
manually taking the best times, vs Python-2.5 (32bit). Measuring on
the command line would give a 57% slowdown instead, because of lack of
warmup.
As a matter of fact, however, pyexpat is not involved here for PyPy,
and here (v1.4) it is still implemented through ctypes (in
lib_pypy/pyexpat.py), and not in RPython in pypy/rlib/.

Python 2.7 may well be faster, which might explain some extra
difference with Stefan's results.

It looks like the two bugs should be easy to fix:
- a file leak on the tested XML module, indeed
- an IOException on module opening converted to "file not found" - at
least in Java, file not found is a specific exception which can be
distinguished from generic I/O errors.

On Mon, Nov 29, 2010 at 22:29, Piotr Skamruk <piotr.skamruk at gmail.com> wrote:
> simplier would be set ulimit -n to 65536 (probably in /etc/security/limits.conf)

Thanks, I needed both this and the GC tips, since during a test run to
run 10^4 iterations, I can't call the GC and still get meaningful
results.

[I'm on Mac OS X though, so ulimit -S -n 10240 is the best one can do,
otherwise "Invalid argument", i.e. EINVAL, results].

Additionally, I just discovered that the ImportError on "import
linecache"  looks filehandle-related as well, because changing the
ulimit changes the iteration count triggering the error, so it's
likely an effect of the same bug. Still, the original error message
should be preserved, and this should be easy to fix.

In these conditions, my best results after warming up are:

0.358 ms PyPy-JIT-32bit (see below for JIT logs)
0.305 ms CPython-2.5-32bit
0.269 ms CPython-2.6-64bit
0.553 ms PyPy-64bit-noJIT, rev 79307, 21 Nov 2010

which means a 17% slowdown on comparable setups, rather than a 2x
slowdown; measuring with timeit on the cmd line, instead, would give a
57% slowdown.
All this is on a very small input file, the one I attached before.

That's for the total of 1000 iterations, on a Core 2 Duo 2.6GHz.

I don't report the average because:
a) it is difficult to get something significant anyway (I don't want
to code confidence intervals, and automated tools wouldn't call GC
appropriately)
b) I expect the deviation to be due more to unrelated load on my
laptop (around 12-18% CPU) than to actual spread of the runtime.

I set PYPYLOG='jit-summary:-' before the PyPy-JIT run and got this - I
hope somebody can check from this whether the JIT is working
successfully.

[f2dd1fbaa1c2] {jit-summary
Tracing:        25      0.163456
Backend:        23      0.017392
Running asm:            191214
Blackhole:              2012
TOTAL:                  502.543032
ops:                    68338
recorded ops:           32764
  calls:                1759
guards:                 18005
opt ops:                2757
opt guards:             696
forcings:               111
abort: trace too long:  2
abort: compiling:       0
abort: vable escape:    0
nvirtuals:              6693
nvholes:                1059
nvreused:               3979
Total # of loops:       18
Total # of bridges:     6
Freed # of loops:       0
Freed # of bridges:     0
[f2dd1fc141a8] jit-summary}

Best regards.

> 2010/11/29 Amaury Forgeot d'Arc <amauryfa at gmail.com>:
>> 2010/11/29 Paolo Giarrusso <p.giarrusso at gmail.com>
>>>
>>> Inspection of the pypy process confirms a leak of file handles to the
>>> XML files. Whether it is GC not being invoked, a missing destructor,
>>> or simply because the code should release file handles, I dunno. Is
>>> there a way to trigger explicit GC to workaround such issues?
>>
>> As usual:
>>     import gc
>>     gc.collect()
>> Calling gc.collect() is indeed a good idea if the code does not explicitly
>> close the files.

-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/



More information about the Pypy-dev mailing list