
What is the recommended HTML parser to run in PyPy? The typical goto for Python is lxml, but of course that doesn't work with PyPy. Has anyone tested any other libraries? Are there any benchmarks? Thanks, -Joe

2013/2/20 Joe Hillenbrand <joehillen@gmail.com>
What is the recommended HTML parser to run in PyPy?
The typical goto for Python is lxml, but of course that doesn't work with PyPy.
This is not true anymore. There has been a lot of work on both sides to make lxml work with PyPy. You should try with latest versions. In addition, there is a port of lxml that does not use Cython nor the C API: https://github.com/amauryfa/lxml/tree/lxml-cffi most of the tests are passing (except objectify), but "setup.py install" does not work yet. It works from the source tree, though. -- Amaury Forgeot d'Arc

On Wed, Feb 20, 2013 at 8:02 PM, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
2013/2/20 Joe Hillenbrand <joehillen@gmail.com>
What is the recommended HTML parser to run in PyPy?
The typical goto for Python is lxml, but of course that doesn't work with PyPy.
This is not true anymore. There has been a lot of work on both sides to make lxml work with PyPy. You should try with latest versions.
In addition, there is a port of lxml that does not use Cython nor the C API: https://github.com/amauryfa/lxml/tree/lxml-cffi most of the tests are passing (except objectify), but "setup.py install" does not work yet. It works from the source tree, though.
-- Amaury Forgeot d'Arc _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Is it working on released cffi or on cffi that's in-development or you need patches?

2013/2/20 Maciej Fijalkowski <fijall@gmail.com>
Is it working on released cffi or on cffi that's in-development or you need patches?
It developed it with a nightly build from mid-January, and the cffi library that was available at the time. It's now released as cffi 0.5 I think. I did not test with CPython at all. At the time cffi used to return enum values as strings, but I just tested with the last version of cffi and pypy nightly build, and tests still pass! Ran 1006 tests in 34.730s FAILED (failures=1) and the only failure is:: self.assertTrue(hasattr(self.etree, '_import_c_api')) :-) -- Amaury Forgeot d'Arc

Great to hear! I just got it working with scrapy. Unfortunately there wasn't any speedup. A normal crawl in CPython takes: real 1m32.238s user 0m56.576s sys 0m1.208s In PyPy: real 1m54.098s user 1m18.105s sys 0m1.372s Thanks for all your hard work. -Joe On Wed, Feb 20, 2013 at 11:28 AM, Amaury Forgeot d'Arc <amauryfa@gmail.com>wrote:
2013/2/20 Maciej Fijalkowski <fijall@gmail.com>
Is it working on released cffi or on cffi that's in-development or you need patches?
It developed it with a nightly build from mid-January, and the cffi library that was available at the time. It's now released as cffi 0.5 I think.
I did not test with CPython at all.
At the time cffi used to return enum values as strings, but I just tested with the last version of cffi and pypy nightly build, and tests still pass!
Ran 1006 tests in 34.730s FAILED (failures=1) and the only failure is:: self.assertTrue(hasattr(self.etree, '_import_c_api')) :-)
-- Amaury Forgeot d'Arc

On Fri, Feb 22, 2013 at 8:39 AM, Joe Hillenbrand <joehillen@gmail.com> wrote:
Great to hear! I just got it working with scrapy. Unfortunately there wasn't any speedup.
A normal crawl in CPython takes: real 1m32.238s user 0m56.576s sys 0m1.208s
In PyPy: real 1m54.098s user 1m18.105s sys 0m1.372s
Thanks for all your hard work.
-Joe
lxml-cffi is known to be slower than normal lxml. You'll get speedups if you start doing non-trivial logic in python, probably. For what is worth, cffi is missing a lot of trivial optimizations (and one non-trivial), so there is a lot of room for improvement.

Hi all, Just so everybody knows, the plan is to release CFFI 0.6 latest when we do the PyPy 2.0 release, and include it fully inside PyPy too. (The idea is to avoid "pip install cffi", which would get a potentially incompatible version: PyPy includes the "_cffi_backend" module, which only works with a specific version of CFFI). A bientôt, Armin.

Are we also planning to bundle ply and cparser? Alex On Wed, Feb 20, 2013 at 11:29 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi all,
Just so everybody knows, the plan is to release CFFI 0.6 latest when we do the PyPy 2.0 release, and include it fully inside PyPy too. (The idea is to avoid "pip install cffi", which would get a potentially incompatible version: PyPy includes the "_cffi_backend" module, which only works with a specific version of CFFI).
A bientôt,
Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

On Wed, Feb 20, 2013 at 9:29 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi all,
Just so everybody knows, the plan is to release CFFI 0.6 latest when we do the PyPy 2.0 release, and include it fully inside PyPy too. (The idea is to avoid "pip install cffi", which would get a potentially incompatible version: PyPy includes the "_cffi_backend" module, which only works with a specific version of CFFI).
A bientôt,
Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
One thing we have to consider is how do you write setup.py (or requirements.txt) in case you need to install cffi on cpython but not pypy
participants (5)
-
Alex Gaynor
-
Amaury Forgeot d'Arc
-
Armin Rigo
-
Joe Hillenbrand
-
Maciej Fijalkowski