Mailman 3 which xml libraries? was (Re: PyPy 1.4 released) - pypy-dev

which xml libraries? was (Re: PyPy 1.4 released)

René Dudfield

Nov. 28, 2010

9:58 a.m.

Hi, what xml libraries are people using with pypy? What is working well? cu, On Sun, Nov 28, 2010 at 9:48 AM, Maciej Fijalkowski <fijall@gmail.com>wrote:

...

Attachments:

attachment.html (text/html — 3.3 KB)

Show replies by date

Maciej Fijalkowski

November 2010

10:09 a.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

On Sun, Nov 28, 2010 at 11:58 AM, René Dudfield <renesd@gmail.com> wrote:

...

PyExpat works, although it's slow (ctypes-based implementation). I know genshi has some troubles with it, someone is debugging now. Besides I don't think there are any working (unless someone wrote a pure-python one) Cheers, fijal

...

On Sun, Nov 28, 2010 at 9:48 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:

...
Hey.

On Sun, Nov 28, 2010 at 10:57 AM, Phyo Arkar <phyo.arkarlwin@gmail.com> wrote:

...
i got python-magic working , after i installed without easy_install (easy_install fail because it tried to install ctypes).

great

...
Now what is not working is python-lxml , which is very important for my project.

lxml won't work out of the box. if you think it's important enough, you can try to port cython to generate something saner (right now what it generates won't work on cpyext).

Cheers, fijal

...
here are the errors:

Running lxml-2.3beta1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Gg3GRA/lxml-2.3beta1/egg-dist-tmp-bwUkM2 Building lxml version 2.3.beta1. NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c' needs to be available. Using build configuration of libxslt 1.1.26 Building against libxml2/libxslt in the following directory: /usr/lib src/lxml/lxml.etree.c:75: error: conflicting types for ‘Py_buffer’ /home/v3ss/pypy-1.4/include/object.h:19: note: previous declaration of ‘Py_buffer’ was here

Had Anyone got lxml working in pypy successfully?

On 11/27/10, Antonio Cuni <anto.cuni@gmail.com> wrote:

...
On 27/11/10 03:09, Phyo Arkar wrote:

...
libmagic python fails to work on pypy (python-magic)

it uses ctypes and easy-install try to download and instaill it , but it fails.

how to enable ctypes on pypy?

Hi Phyo, ctypes *is* enabled on pypy by default.

If python-magic does not work, it can either:

1) be a pypy bug: please report it as an issue (possibly with a simple failing test and the full traceack)

2) a python-magic issue, e.g. if it plays dirty ctypes trick that cannot really be supported by pypy due to the internal differences with CPython

ciao, Anto

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

Amaury Forgeot d'Arc

10:44 a.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Hi 2010/11/28 Maciej Fijalkowski <fijall@gmail.com>

...

PyExpat is now a built-in module, implemented in RPython, and should have reasonable performance. -- Amaury Forgeot d'Arc

Stefan Behnel

1:40 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Amaury Forgeot d'Arc, 28.11.2010 11:44:

...

Hmm, reasonable? $ ./bin/pypy -m timeit -s 'import xml.etree.ElementTree as ET' \ 'ET.parse("ot.xml")' 10 loops, best of 3: 1.27 sec per loop $ python2.7 -m timeit -s 'import xml.etree.ElementTree as ET' \ 'ET.parse("ot.xml")' 10 loops, best of 3: 486 msec per loop $ python2.7 -m timeit -s 'import xml.etree.cElementTree as ET' \ 'ET.parse("ot.xml")' 10 loops, best of 3: 33.7 msec per loop Stefan

René Dudfield

1:52 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Ah, does etree work in pypy? That's just python right? On Mon, Nov 29, 2010 at 1:40 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:

...

Stefan Behnel

1:58 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

René Dudfield, 29.11.2010 14:52:

...

Yes, ET is plain Python. cET is not, though, as the name indicates. Stefan

Paolo Giarrusso

8:54 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

On Mon, Nov 29, 2010 at 14:40, Stefan Behnel <stefan_ml@behnel.de> wrote:

...

Is any JITting expected to trigger with so few iteractions? Or does RPython saves the need for that? I tried increasing the loop count, but I couldn't, because of two different bugs somewhere (in PyPy I guess). I tried ensuring that at least 1000 iterations were displayed, but timeit doesn't work for more than 852 iterations on the attached example (found on my HD): $ pypy-trunk/pypy/translator/goal/pypy-c -m timeit -n 853 -s 'import xml.etree.ElementTree as ET' 'ET.parse("extensionNames.xml")' ImportError: No module named linecache Now, even if linecache is imported locally, linecache.py exists (located in the same path as timeit.py, i.e. lib-python/2.5.2/). Furthermore, it works fine on the Python interpreter, suggesting that the -m option might be part of the bug: import timeit a=timeit.Timer('ET.parse("extensionNames.xml")', 'import xml.etree.ElementTree as ET') a.timeit(1000) However, a bigger timing count doesn't work: line 161, in timeit File "<timeit-src>", line 6, in inner File "/Users/pgiarrusso/Documents/Research/Sorgenti/PyPy/pypy-trunk/lib_pypy/xml/etree/ElementTree.py", line 862, in parse File "/Users/pgiarrusso/Documents/Research/Sorgenti/PyPy/pypy-trunk/lib_pypy/xml/etree/ElementTree.py", line 579, in parse IOError: [Errno 24] Too many open files: 'extensionNames.xml' Inspection of the pypy process confirms a leak of file handles to the XML files. Whether it is GC not being invoked, a missing destructor, or simply because the code should release file handles, I dunno. Is there a way to trigger explicit GC to workaround such issues? Warning: all this is with a 32bit PyPy-1.4 on Mac OS X. Bye -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Amaury Forgeot d'Arc

8:57 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

2010/11/29 Paolo Giarrusso <p.giarrusso@gmail.com>

...

As usual: import gc gc.collect() Calling gc.collect() is indeed a good idea if the code does not explicitly close the files. -- Amaury Forgeot d'Arc

Piotr Skamruk

9:29 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

simplier would be set ulimit -n to 65536 (probably in /etc/security/limits.conf) 2010/11/29 Amaury Forgeot d'Arc <amauryfa@gmail.com>:

...

Paolo Giarrusso

11:14 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Hi all, thanks to the tips, I verified on Mac OS X a 17% slowdown, after manually taking the best times, vs Python-2.5 (32bit). Measuring on the command line would give a 57% slowdown instead, because of lack of warmup. As a matter of fact, however, pyexpat is not involved here for PyPy, and here (v1.4) it is still implemented through ctypes (in lib_pypy/pyexpat.py), and not in RPython in pypy/rlib/. Python 2.7 may well be faster, which might explain some extra difference with Stefan's results. It looks like the two bugs should be easy to fix: - a file leak on the tested XML module, indeed - an IOException on module opening converted to "file not found" - at least in Java, file not found is a specific exception which can be distinguished from generic I/O errors. On Mon, Nov 29, 2010 at 22:29, Piotr Skamruk <piotr.skamruk@gmail.com> wrote:

...

simplier would be set ulimit -n to 65536 (probably in /etc/security/limits.conf)

Thanks, I needed both this and the GC tips, since during a test run to run 10^4 iterations, I can't call the GC and still get meaningful results. [I'm on Mac OS X though, so ulimit -S -n 10240 is the best one can do, otherwise "Invalid argument", i.e. EINVAL, results]. Additionally, I just discovered that the ImportError on "import linecache" looks filehandle-related as well, because changing the ulimit changes the iteration count triggering the error, so it's likely an effect of the same bug. Still, the original error message should be preserved, and this should be easy to fix. In these conditions, my best results after warming up are: 0.358 ms PyPy-JIT-32bit (see below for JIT logs) 0.305 ms CPython-2.5-32bit 0.269 ms CPython-2.6-64bit 0.553 ms PyPy-64bit-noJIT, rev 79307, 21 Nov 2010 which means a 17% slowdown on comparable setups, rather than a 2x slowdown; measuring with timeit on the cmd line, instead, would give a 57% slowdown. All this is on a very small input file, the one I attached before. That's for the total of 1000 iterations, on a Core 2 Duo 2.6GHz. I don't report the average because: a) it is difficult to get something significant anyway (I don't want to code confidence intervals, and automated tools wouldn't call GC appropriately) b) I expect the deviation to be due more to unrelated load on my laptop (around 12-18% CPU) than to actual spread of the runtime. I set PYPYLOG='jit-summary:-' before the PyPy-JIT run and got this - I hope somebody can check from this whether the JIT is working successfully. [f2dd1fbaa1c2] {jit-summary Tracing: 25 0.163456 Backend: 23 0.017392 Running asm: 191214 Blackhole: 2012 TOTAL: 502.543032 ops: 68338 recorded ops: 32764 calls: 1759 guards: 18005 opt ops: 2757 opt guards: 696 forcings: 111 abort: trace too long: 2 abort: compiling: 0 abort: vable escape: 0 nvirtuals: 6693 nvholes: 1059 nvreused: 3979 Total # of loops: 18 Total # of bridges: 6 Freed # of loops: 0 Freed # of bridges: 0 [f2dd1fc141a8] jit-summary} Best regards.

...

-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Amaury Forgeot d'Arc

11:45 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

2010/11/30 Paolo Giarrusso <p.giarrusso@gmail.com>

...

Did you compile pypy yourself? if the expat development files are present, the translation should build the pyexpat module: Python 2.5.2 (79656, Nov 29 2010, 21:05:28) [PyPy 1.4.0] on linux2

...

-- Amaury Forgeot d'Arc

Maciej Fijalkowski

7:13 a.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

On Tue, Nov 30, 2010 at 1:45 AM, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:

...

It's also module/pyexpat and not rlib (rlib is for RPython libraries)

...

Paolo Giarrusso

11:34 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

On Tue, Nov 30, 2010 at 08:13, Maciej Fijalkowski <fijall@gmail.com> wrote:

...

My apologies, I self-compiled PyPy and I get the output you describe indeed. Therefore I guess that the ctypes implementation I come across in lib_pypy/pyexpat.py is probably a fallback - in case only the library, but not the headers, are present. Anyway, this does not interact with benchmarks above - Stefan, I still don't get why you complained that pyexpat is slow by showing benchmarks for another module, I guess I do not understand your email, but it asks "reasonable?" after Amaury talks about pyexpat. I'll try to benchmark it soon; a reasonable way to call pyexpat would make it simpler since I have limited time and mental energy to devote, and figuring out a non-stupid way to use it might be non-trivial without learning to use the library. Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Stefan Behnel

December 2010

7:48 a.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Paolo Giarrusso, 01.12.2010 00:34:

...

Well, in CPython, I can see little to no reason why anyone would go as low-level as pyexpat when you can use cElementTree. So IMHO the level to compare is what people would normally use rather than what people could potentially use if they were beaten hard enough. Stefan

Maciej Fijalkowski

8 a.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

On Wed, Dec 1, 2010 at 9:48 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:

...

Hey. Sure, makes sense :-) I think one of the reasons for some slowdown is that calls from C are not jitted if they don't contain loops themselves. This doesn't explain the whole thing obviously, because there is something really wrong going on looking at numbers.

Maciej Fijalkowski

November 2010

11:09 a.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

On Sun, Nov 28, 2010 at 11:58 AM, René Dudfield <renesd@gmail.com> wrote:

...

On Sun, Nov 28, 2010 at 9:48 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:

...
Hey.

On Sun, Nov 28, 2010 at 10:57 AM, Phyo Arkar <phyo.arkarlwin@gmail.com> wrote:

...
i got python-magic working , after i installed without easy_install (easy_install fail because it tried to install ctypes).

great

...
Now what is not working is python-lxml , which is very important for my project.

lxml won't work out of the box. if you think it's important enough, you can try to port cython to generate something saner (right now what it generates won't work on cpyext).

Cheers, fijal

...
here are the errors:

Running lxml-2.3beta1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Gg3GRA/lxml-2.3beta1/egg-dist-tmp-bwUkM2 Building lxml version 2.3.beta1. NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c' needs to be available. Using build configuration of libxslt 1.1.26 Building against libxml2/libxslt in the following directory: /usr/lib src/lxml/lxml.etree.c:75: error: conflicting types for ‘Py_buffer’ /home/v3ss/pypy-1.4/include/object.h:19: note: previous declaration of ‘Py_buffer’ was here

Had Anyone got lxml working in pypy successfully?

On 11/27/10, Antonio Cuni <anto.cuni@gmail.com> wrote:

...
On 27/11/10 03:09, Phyo Arkar wrote:

...
libmagic python fails to work on pypy (python-magic)

it uses ctypes and easy-install try to download and instaill it , but it fails.

how to enable ctypes on pypy?

Hi Phyo, ctypes *is* enabled on pypy by default.

If python-magic does not work, it can either:

1) be a pypy bug: please report it as an issue (possibly with a simple failing test and the full traceack)

2) a python-magic issue, e.g. if it plays dirty ctypes trick that cannot really be supported by pypy due to the internal differences with CPython

ciao, Anto

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

Amaury Forgeot d'Arc

11:44 a.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Hi 2010/11/28 Maciej Fijalkowski <fijall@gmail.com>

...

PyExpat is now a built-in module, implemented in RPython, and should have reasonable performance. -- Amaury Forgeot d'Arc

Stefan Behnel

2:40 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Amaury Forgeot d'Arc, 28.11.2010 11:44:

...

René Dudfield

2:52 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

Ah, does etree work in pypy? That's just python right? On Mon, Nov 29, 2010 at 1:40 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:

...

Stefan Behnel

2:58 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

René Dudfield, 29.11.2010 14:52:

...

Yes, ET is plain Python. cET is not, though, as the name indicates. Stefan

Paolo Giarrusso

9:54 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

On Mon, Nov 29, 2010 at 14:40, Stefan Behnel <stefan_ml@behnel.de> wrote:

...

Amaury Forgeot d'Arc

November 2010

8:57 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

2010/11/29 Paolo Giarrusso <p.giarrusso@gmail.com>

...

As usual: import gc gc.collect() Calling gc.collect() is indeed a good idea if the code does not explicitly close the files. -- Amaury Forgeot d'Arc

Piotr Skamruk

9:29 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

simplier would be set ulimit -n to 65536 (probably in /etc/security/limits.conf) 2010/11/29 Amaury Forgeot d'Arc <amauryfa@gmail.com>:

...

Paolo Giarrusso

11:14 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

...

simplier would be set ulimit -n to 65536 (probably in /etc/security/limits.conf)

...

-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Amaury Forgeot d'Arc

11:45 p.m.

New subject: which xml libraries? was (Re: PyPy 1.4 released)

2010/11/30 Paolo Giarrusso <p.giarrusso@gmail.com>

...

Did you compile pypy yourself? if the expat development files are present, the translation should build the pyexpat module: Python 2.5.2 (79656, Nov 29 2010, 21:05:28) [PyPy 1.4.0] on linux2

...

-- Amaury Forgeot d'Arc