lxml running on PyPy
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi, here's a little status update regarding lxml on PyPy. I got the basics of lxml.etree working so far, mostly by patching up Cython and tracking down bugs in PyPy's cpyext (CPython C-API compatibility) layer. I'm still getting crashes during error reporting and didn't care much about XPath or XSLT yet. But given that those do not have much Python interaction per se, I don't expect major surprises on that front. The results are very encouraging, given that PyPy lacks support for many of the tweaks and hacks that are possible in CPython. Here's a little parser benchmark: $ python2.7 -m timeit -s 'import lxml.etree as et' 'et.parse("hamlet.xml")' 100 loops, best of 3: 4.61 msec per loop $ pypy -m timeit -s 'import lxml.etree as et' 'et.parse("hamlet.xml")' 100 loops, best of 3: 5.74 msec per loop Pretty acceptable. That makes lxml the fastest XML parser that currently exists for PyPy. And here's a worst case benchmark for element proxy instantiation and iteration, likely the most heavily tuned parts of lxml when running in CPython: $ python2.7 -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' 'list(t.iter())' 100 loops, best of 3: 2.71 msec per loop $ pypy -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' 'list(t.iter())' 10 loops, best of 3: 28.2 msec per loop That's about a factor of 10. Sounds huge, but it's actually not bad, considering the amount of extra work that has to be done for PyPy here. Certainly doesn't render it unusable, we are still talking milliseconds after all. And so far, there hasn't gone any tuning into it, so it's not the final word. I'm pretty optimistic. BTW, if you're interested in improvements on this front, you can help getting this done faster by using the "donate" button on lxml's project home page. Any donation will help in freeing some of my time for this. Stefan
data:image/s3,"s3://crabby-images/53b7a/53b7a441f6f2a0ce12108ae436842605ac0e275e" alt=""
Very nice, this work is really appreciated. Some days ago, I played with some algorithms on PyPy and CPython that got their primary data from larger xml-files. Whereas the algorithms (combinatorics by recursion-heavy list-processing) got a nice speedup of about factor 5 (from 1 sec downto 200 msecs on my testdata) - the initial XML parsing + find()/findall() processing jumped from 5 msecs to 200 msecs. So, lxml on PyPy would be awesome! --dirk Am 22.04.2012, 00:46 Uhr, schrieb Stefan Behnel <stefan_ml@behnel.de>:
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Dirk Rothe, 22.04.2012 07:48:
I guess you switched to plain ElementTree? PyPy doesn't do all that badly here, but it's several times slower than the highly tuned C implementations in CPython: http://blog.behnel.de/index.php?p=210 For tree iteration in lxml (and the related .find*() methods), PyPy already seems to be pretty close to what I get in CPython. Sure, it depends on how many results you get back because passing them through the interface to PyPy isn't very fast, but at least the internal tree traversal speed isn't impacted. Examples: Complete traversal, one hit: $ python2.7 -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' \ 'list(t.iter("PLAY"))' 1000 loops, best of 3: 382 usec per loop $ pypy -m timeit -s 'import lxml.etree as et; t=et.parse("hamlet.xml")' \ 'list(t.iter("PLAY"))' 1000 loops, best of 3: 284 usec per loop Complete traversal, tons of hits: $ python2.7 -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' \ 'list(t.iter("LINE"))' 1000 loops, best of 3: 1.94 msec per loop $ pypy -m timeit -s 'import lxml.etree as et; t=et.parse("hamlet.xml")' \ 'list(t.iter("LINE"))' 100 loops, best of 3: 7.48 msec per loop Surprisingly enough, I get very unreliable results for PyPy here. Rerunning the above several times gives me this as the best result: $ pypy -m timeit -s 'import lxml.etree as et; t=et.parse("hamlet.xml")' \ 'list(t.iter("LINE"))' 100 loops, best of 3: 3.71 msec per loop So it seems that it *can* be pretty close to CPython for that as well. But your use case reminds me of iterparse(). There will certainly be some substantial overhead involved in running iterparse in PyPy. Currently, it seems to be about a factor of 15: $ pypy -m timeit -s 'import lxml.etree as et' \ 't=list(et.iterparse("hamlet.xml"))' 10 loops, best of 3: 157 msec per loop $ python2.7 -m timeit -s 'import lxml.etree as et' \ 't=list(et.iterparse("hamlet.xml"))' 100 loops, best of 3: 10.8 msec per loop Needs some work and a bit of profiling, I guess...
So, lxml on PyPy would be awesome!
You can support the progress. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op So, 2012-04-22 om 00:46 +0200 skryf Stefan Behnel:
Hi Stefan After being very excited about PyPy a while ago, I've realised that I'm unlikely to be able to make extensive use of it any time soon, mostly due to dependencies. I've been trying to follow the developments there, and saw your interactions on the mailing list and followed with keen interest. This is just a quick message to say thank you for the hard work on this front, and the continued support for lxml over the years. Please say when there are released versions that would be worth while trying out, and I might be able to run some tests here with on code. Nothing fancy, but with some use of XPath. Thanks again. Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/survey-about-usability-virt...
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op So, 2012-04-22 om 00:46 +0200 skryf Stefan Behnel:
Hi Stephan Pypy 1.9 was just released. Does it contain the necessary changes for me to give this a try? Are the cython changes that are needed in a released version of cython? Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 08.06.2012 22:54:
Yes. Please do, I'd be happy about any feedback. There are still features that crash and I need to sort that out, but many things work and are worth a try.
Are the cython changes that are needed in a released version of cython?
No, that would be 0.17. But you can get the archive from github, not that different from a release. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Sa, 2012-06-09 om 11:51 +0200 skryf Stefan Behnel:
I installed cython from git master without a problem. Trying the same with lxml with pip didn't work. So I tried 'make inplace", and got the error below. I've only ever installed from released versions where cython is not needed, so I'm not familiar with what I need to do to make this work. Any pointers will be appreciated. $ make inplace pypy-1.9/bin/pypy setup.py build_ext -i Building lxml version 2.4.dev. Building with Cython 0.17pre. Using build configuration of libxslt 1.1.24 Building against libxml2/libxslt in the following directory: /usr/lib running build_ext cythoning src/lxml/lxml.etree.pyx to src/lxml/lxml.etree.c warning: src/lxml/xmlerror.pxi:569:26: local variable 'args' referenced before assignment building 'lxml.etree' extension creating build/temp.linux-i686-2.7 creating build/temp.linux-i686-2.7/src creating build/temp.linux-i686-2.7/src/lxml cc -fPIC -Wimplicit -I/usr/include/libxml2 -I/home/fwolff/download/python/lxml-lxml-2260f8d/src/lxml//include -I/home/fwolff/download/python/pypy-1.9/include -c src/lxml/lxml.etree.c -o build/temp.linux-i686-2.7/src/lxml/lxml.etree.o -w src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_9_ErrorLog_connect’: src/lxml/lxml.etree.c:31030: error: ‘xmlStructuredErrorContext’ undeclared (first use in this function) src/lxml/lxml.etree.c:31030: error: (Each undeclared identifier is reported only once src/lxml/lxml.etree.c:31030: error: for each function it appears in.) error: command 'cc' failed with exit status 1 make: *** [inplace] Error 1 Thanks for any help! Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 09.06.2012 14:35:
This is neither related to PyPy nor Cython. It's due to my rewrite of the error reporting code (intended to make it less intrusive when mixed with other users of libxml2), for which I accidentally used a feature that was only added in libxml2 2.7.4. That's too bad because this feature is rather crucial to the way error reporting should work. Guess I'll have to undo those changes, at least conditionally. I don't see how to make them work in those really old libxml2 versions that the current lxml releases still work with (even 2.7.4 has been out for almost three years now). Thanks for the report anyway, better to do it now than when trying to get out the release. I pushed a quick fix that makes it compile again, so that you can test it. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Arfrever Frehtes Taifersar Arahesis, 09.06.2012 16:02:
Well, yes, I'm aware of that. Raising it from 2.6.20 straight to 2.7.4, or rather 2.7.8 due to bugs in earlier versions, would be quite a step though, and it would definitely make some users unhappy. 2.6.x is still pretty widely used, from my experience. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Sa, 2012-06-09 om 15:32 +0200 skryf Stefan Behnel:
Thanks for the quick reply! Now I was able to build and install it successfully. Thanks! I tried to run the test suite of the Translate Toolkit with this new lxml under pypy. It is not able to complete the test run due to crashes. I'll try to isolate a few issues and get it down to small test cases. Here are the first few: =============================================================
Here lxml under cpython gives: u'\xa0' =============================================================
Here lxml under cpython gives: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2440, in lxml.etree.fromstring (src/lxml/lxml.etree.c:23985) File "parser.pxi", line 1510, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:63925) File "parser.pxi", line 1389, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:62857) File "parser.pxi", line 931, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:60016) File "parser.pxi", line 542, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:56659) File "parser.pxi", line 628, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:57504) File "parser.pxi", line 568, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:56902) lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value 16, line 1, column 9 =============================================================
Here lxml under cpython gives: <lxml.etree.DTD object at 0xb76f6c2c> ============================================================= I hope it helps! I realise some of this might be pypy bugs, but it is impossible for me to say, so I'll start here. Keep well Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 09.06.2012 20:14:
Thanks for the quick reply! Now I was able to build and install it successfully. Thanks!
Cool.
I tried to run the test suite of the Translate Toolkit with this new lxml under pypy. It is not able to complete the test run due to crashes.
I can imagine.
Yes, I didn't have time to look into XPath at all yet. There are definitely still problems to work around at that front. It's the next thing to work on, I guess.
This looks like a bug in PyPy's cpyext emulation of PyFile_AsFile(), definitely worth filing a bug report with them. Given how new cpyext still is, it's not entirely surprising that it's somewhat badly tested, especially when it comes to unexpected input (such as a StringIO object in this case). A quick way to work around this in lxml would be to disable the use of PyFile_AsFile(), as done for Py3 already. Or maybe it's better to add an explicit check if the input value actually is a file object before calling that function.
I hope it helps! I realise some of this might be pypy bugs, but it is impossible for me to say, so I'll start here.
Thanks for testing. I've been seriously head under water for a while now, but I hope to find some time soon to tackle a couple of the main issues that are left. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Stefan Behnel, 09.06.2012 22:17:
I pushed a couple of changes to the master branch on github that fix or work around the most visible issues. Most importantly, threading is now disabled when compiling under PyPy as that's not currently supported by cpyext (and it only half-way fakes it). Both XPath and XSLT seem to work now, as does validation. There are still crashes in the test suite, but they are definitely getting fewer. Most test failures are now in the doctests due to different exceptions and/or error messages in PyPy etc. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Ma, 2012-06-18 om 19:58 +0200 skryf Stefan Behnel:
I rebuilt, and things are looking substantially better. I'm able to complete the test run without a crash! Thanks! Most failures now seem to be because of the first issue I mentioned in my previous mail, simplified to something like this:
On the same line in xpath.pxi, I also saw at least once: SystemError: <StackOverflow object at 0x905d4ec>
Thanks for the work on this. Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 19.06.2012 11:22:
I've seen these, too, and I've seen similar PyPy bugs that have already been worked around in Cython weeks ago, so no idea where these have crept back up from. I'll take a look.
Did you use yesterday's really, really latest github Cython? There's a bug in PyPy I needed to work around that is related to iteration in extension type subclasses (i.e. the exact case above). Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Di, 2012-06-19 om 12:22 +0200 skryf Stefan Behnel:
No, I didn't sorry. I only updated lxml, not cython. I rebuilt both now. For cython: b3379294b1d0982363186577b06482e8d9285158 for lxml: a1690ce2106897481ebda6339ccfb6e73236f2a2 The failures are still inconsistent, and the errors relating to .iterchildren still seem to be there. I didn't think it is useful to repeat the tests multiple times to see the distribution of the number of failures, but it does seem as if it might be a few less on average for one of the files I looked at. I'm not sure that is useful to know, though :-/ Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/53b7a/53b7a441f6f2a0ce12108ae436842605ac0e275e" alt=""
Very nice, this work is really appreciated. Some days ago, I played with some algorithms on PyPy and CPython that got their primary data from larger xml-files. Whereas the algorithms (combinatorics by recursion-heavy list-processing) got a nice speedup of about factor 5 (from 1 sec downto 200 msecs on my testdata) - the initial XML parsing + find()/findall() processing jumped from 5 msecs to 200 msecs. So, lxml on PyPy would be awesome! --dirk Am 22.04.2012, 00:46 Uhr, schrieb Stefan Behnel <stefan_ml@behnel.de>:
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Dirk Rothe, 22.04.2012 07:48:
I guess you switched to plain ElementTree? PyPy doesn't do all that badly here, but it's several times slower than the highly tuned C implementations in CPython: http://blog.behnel.de/index.php?p=210 For tree iteration in lxml (and the related .find*() methods), PyPy already seems to be pretty close to what I get in CPython. Sure, it depends on how many results you get back because passing them through the interface to PyPy isn't very fast, but at least the internal tree traversal speed isn't impacted. Examples: Complete traversal, one hit: $ python2.7 -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' \ 'list(t.iter("PLAY"))' 1000 loops, best of 3: 382 usec per loop $ pypy -m timeit -s 'import lxml.etree as et; t=et.parse("hamlet.xml")' \ 'list(t.iter("PLAY"))' 1000 loops, best of 3: 284 usec per loop Complete traversal, tons of hits: $ python2.7 -m timeit -s 'import lxml.etree as et; \ t=et.parse("hamlet.xml")' \ 'list(t.iter("LINE"))' 1000 loops, best of 3: 1.94 msec per loop $ pypy -m timeit -s 'import lxml.etree as et; t=et.parse("hamlet.xml")' \ 'list(t.iter("LINE"))' 100 loops, best of 3: 7.48 msec per loop Surprisingly enough, I get very unreliable results for PyPy here. Rerunning the above several times gives me this as the best result: $ pypy -m timeit -s 'import lxml.etree as et; t=et.parse("hamlet.xml")' \ 'list(t.iter("LINE"))' 100 loops, best of 3: 3.71 msec per loop So it seems that it *can* be pretty close to CPython for that as well. But your use case reminds me of iterparse(). There will certainly be some substantial overhead involved in running iterparse in PyPy. Currently, it seems to be about a factor of 15: $ pypy -m timeit -s 'import lxml.etree as et' \ 't=list(et.iterparse("hamlet.xml"))' 10 loops, best of 3: 157 msec per loop $ python2.7 -m timeit -s 'import lxml.etree as et' \ 't=list(et.iterparse("hamlet.xml"))' 100 loops, best of 3: 10.8 msec per loop Needs some work and a bit of profiling, I guess...
So, lxml on PyPy would be awesome!
You can support the progress. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op So, 2012-04-22 om 00:46 +0200 skryf Stefan Behnel:
Hi Stefan After being very excited about PyPy a while ago, I've realised that I'm unlikely to be able to make extensive use of it any time soon, mostly due to dependencies. I've been trying to follow the developments there, and saw your interactions on the mailing list and followed with keen interest. This is just a quick message to say thank you for the hard work on this front, and the continued support for lxml over the years. Please say when there are released versions that would be worth while trying out, and I might be able to run some tests here with on code. Nothing fancy, but with some use of XPath. Thanks again. Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/survey-about-usability-virt...
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op So, 2012-04-22 om 00:46 +0200 skryf Stefan Behnel:
Hi Stephan Pypy 1.9 was just released. Does it contain the necessary changes for me to give this a try? Are the cython changes that are needed in a released version of cython? Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 08.06.2012 22:54:
Yes. Please do, I'd be happy about any feedback. There are still features that crash and I need to sort that out, but many things work and are worth a try.
Are the cython changes that are needed in a released version of cython?
No, that would be 0.17. But you can get the archive from github, not that different from a release. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Sa, 2012-06-09 om 11:51 +0200 skryf Stefan Behnel:
I installed cython from git master without a problem. Trying the same with lxml with pip didn't work. So I tried 'make inplace", and got the error below. I've only ever installed from released versions where cython is not needed, so I'm not familiar with what I need to do to make this work. Any pointers will be appreciated. $ make inplace pypy-1.9/bin/pypy setup.py build_ext -i Building lxml version 2.4.dev. Building with Cython 0.17pre. Using build configuration of libxslt 1.1.24 Building against libxml2/libxslt in the following directory: /usr/lib running build_ext cythoning src/lxml/lxml.etree.pyx to src/lxml/lxml.etree.c warning: src/lxml/xmlerror.pxi:569:26: local variable 'args' referenced before assignment building 'lxml.etree' extension creating build/temp.linux-i686-2.7 creating build/temp.linux-i686-2.7/src creating build/temp.linux-i686-2.7/src/lxml cc -fPIC -Wimplicit -I/usr/include/libxml2 -I/home/fwolff/download/python/lxml-lxml-2260f8d/src/lxml//include -I/home/fwolff/download/python/pypy-1.9/include -c src/lxml/lxml.etree.c -o build/temp.linux-i686-2.7/src/lxml/lxml.etree.o -w src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_9_ErrorLog_connect’: src/lxml/lxml.etree.c:31030: error: ‘xmlStructuredErrorContext’ undeclared (first use in this function) src/lxml/lxml.etree.c:31030: error: (Each undeclared identifier is reported only once src/lxml/lxml.etree.c:31030: error: for each function it appears in.) error: command 'cc' failed with exit status 1 make: *** [inplace] Error 1 Thanks for any help! Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 09.06.2012 14:35:
This is neither related to PyPy nor Cython. It's due to my rewrite of the error reporting code (intended to make it less intrusive when mixed with other users of libxml2), for which I accidentally used a feature that was only added in libxml2 2.7.4. That's too bad because this feature is rather crucial to the way error reporting should work. Guess I'll have to undo those changes, at least conditionally. I don't see how to make them work in those really old libxml2 versions that the current lxml releases still work with (even 2.7.4 has been out for almost three years now). Thanks for the report anyway, better to do it now than when trying to get out the release. I pushed a quick fix that makes it compile again, so that you can test it. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Arfrever Frehtes Taifersar Arahesis, 09.06.2012 16:02:
Well, yes, I'm aware of that. Raising it from 2.6.20 straight to 2.7.4, or rather 2.7.8 due to bugs in earlier versions, would be quite a step though, and it would definitely make some users unhappy. 2.6.x is still pretty widely used, from my experience. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Sa, 2012-06-09 om 15:32 +0200 skryf Stefan Behnel:
Thanks for the quick reply! Now I was able to build and install it successfully. Thanks! I tried to run the test suite of the Translate Toolkit with this new lxml under pypy. It is not able to complete the test run due to crashes. I'll try to isolate a few issues and get it down to small test cases. Here are the first few: =============================================================
Here lxml under cpython gives: u'\xa0' =============================================================
Here lxml under cpython gives: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2440, in lxml.etree.fromstring (src/lxml/lxml.etree.c:23985) File "parser.pxi", line 1510, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:63925) File "parser.pxi", line 1389, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:62857) File "parser.pxi", line 931, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:60016) File "parser.pxi", line 542, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:56659) File "parser.pxi", line 628, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:57504) File "parser.pxi", line 568, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:56902) lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value 16, line 1, column 9 =============================================================
Here lxml under cpython gives: <lxml.etree.DTD object at 0xb76f6c2c> ============================================================= I hope it helps! I realise some of this might be pypy bugs, but it is impossible for me to say, so I'll start here. Keep well Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 09.06.2012 20:14:
Thanks for the quick reply! Now I was able to build and install it successfully. Thanks!
Cool.
I tried to run the test suite of the Translate Toolkit with this new lxml under pypy. It is not able to complete the test run due to crashes.
I can imagine.
Yes, I didn't have time to look into XPath at all yet. There are definitely still problems to work around at that front. It's the next thing to work on, I guess.
This looks like a bug in PyPy's cpyext emulation of PyFile_AsFile(), definitely worth filing a bug report with them. Given how new cpyext still is, it's not entirely surprising that it's somewhat badly tested, especially when it comes to unexpected input (such as a StringIO object in this case). A quick way to work around this in lxml would be to disable the use of PyFile_AsFile(), as done for Py3 already. Or maybe it's better to add an explicit check if the input value actually is a file object before calling that function.
I hope it helps! I realise some of this might be pypy bugs, but it is impossible for me to say, so I'll start here.
Thanks for testing. I've been seriously head under water for a while now, but I hope to find some time soon to tackle a couple of the main issues that are left. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Stefan Behnel, 09.06.2012 22:17:
I pushed a couple of changes to the master branch on github that fix or work around the most visible issues. Most importantly, threading is now disabled when compiling under PyPy as that's not currently supported by cpyext (and it only half-way fakes it). Both XPath and XSLT seem to work now, as does validation. There are still crashes in the test suite, but they are definitely getting fewer. Most test failures are now in the doctests due to different exceptions and/or error messages in PyPy etc. Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Ma, 2012-06-18 om 19:58 +0200 skryf Stefan Behnel:
I rebuilt, and things are looking substantially better. I'm able to complete the test run without a crash! Thanks! Most failures now seem to be because of the first issue I mentioned in my previous mail, simplified to something like this:
On the same line in xpath.pxi, I also saw at least once: SystemError: <StackOverflow object at 0x905d4ec>
Thanks for the work on this. Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
F Wolff, 19.06.2012 11:22:
I've seen these, too, and I've seen similar PyPy bugs that have already been worked around in Cython weeks ago, so no idea where these have crept back up from. I'll take a look.
Did you use yesterday's really, really latest github Cython? There's a bug in PyPy I needed to work around that is related to iteration in extension type subclasses (i.e. the exact case above). Stefan
data:image/s3,"s3://crabby-images/46178/4617843687b32566042cace5498c0ba34f8fb116" alt=""
Op Di, 2012-06-19 om 12:22 +0200 skryf Stefan Behnel:
No, I didn't sorry. I only updated lxml, not cython. I rebuilt both now. For cython: b3379294b1d0982363186577b06482e8d9285158 for lxml: a1690ce2106897481ebda6339ccfb6e73236f2a2 The failures are still inconsistent, and the errors relating to .iterchildren still seem to be there. I didn't think it is useful to repeat the tests multiple times to see the distribution of the number of failures, but it does seem as if it might be a few less on average for one of the files I looked at. I'm not sure that is useful to know, though :-/ Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/localisation-guide-now-avai...
participants (4)
-
Arfrever Frehtes Taifersar Arahesis
-
Dirk Rothe
-
F Wolff
-
Stefan Behnel