[lxml-dev] Installing lxml 2.0beta1 via easy_install requires Cython; also, question about lxml.html.clean.clean_html
I attempted to install lxml 2.0beta1 via easy_install (easy_install lxml==2.0beta1), and it didn't work. After a bunch of experimentation, I discovered that the C files that are supposed to be present in the download were not present. After installing a patched version of Cython 0.9.6.10b (patched according to the directions I found on this list) lxml successfully installed. But I was very surprised at this requirement. Also, I'm not sure, but I think the lxml.html.clean.clean_html() function might not be working properly? I followed the example at http://codespeak.net/lxml/dev/lxmlhtml.html#cleaning-up-html but got different results. I expected this: <html> <body> <div> <style>/* deleted */</style> <a href="">a link</a> <a href="#">another link</a> <p>a paragraph</p> <div>secret EVIL!</div> of EVIL! Password: annoying EVIL! <a href="evil-site">spam spam SPAM!</a> <img src="evil!"> </div> </body> </html> But got this: <div><style>/* deleted */</style><body> <a href="">a link</a> <a href="#">another link</a> <p>a paragraph</p> <div>secret EVIL!</div> of EVIL! Password: annoying EVIL!<a href="evil-site">spam spam SPAM!</a> <img src="evil!"></body></div>
Hi, Jon Rosebaugh wrote:
I attempted to install lxml 2.0beta1 via easy_install (easy_install lxml==2.0beta1), and it didn't work. After a bunch of experimentation, I discovered that the C files that are supposed to be present in the download were not present. After installing a patched version of Cython 0.9.6.10b (patched according to the directions I found on this list) lxml successfully installed.
Hmm, it shouldn't be that hard. The tgz I downloaded has the .c files, so installing without Cython should work just fine. I just removed my local Cython install and did an "easy_install lxml" (which downloaded, built and installed 2.0beta1) and also an "easy_install lxml-2.0beta1.tar.gz". Both worked just fine. Maybe you had an older version of Cython installed? If that's found, it will be used - and obviously fail.
Also, I'm not sure, but I think the lxml.html.clean.clean_html() function might not be working properly? I followed the example at http://codespeak.net/lxml/dev/lxmlhtml.html#cleaning-up-html but got different results. I expected this: <html> <body> <div> <style>/* deleted */</style> <a href="">a link</a> <a href="#">another link</a> <p>a paragraph</p> <div>secret EVIL!</div> of EVIL! Password: annoying EVIL! <a href="evil-site">spam spam SPAM!</a> <img src="evil!"> </div> </body> </html>
But got this: <div><style>/* deleted */</style><body>
<a href="">a link</a> <a href="#">another link</a> <p>a paragraph</p> <div>secret EVIL!</div> of EVIL!
Password: annoying EVIL!<a href="evil-site">spam spam SPAM!</a> <img src="evil!"></body></div>
That one should work, too. I just ran lxmlhtml.txt as doctest (which admittedly wasn't included in the test suite before) and it just worked. Same for test_clean.txt. What's the version of libxml2 you are using? Can you try running the test suite and see if that works for you? Stefan
On Jan 12, 2008 2:46 AM, Stefan Behnel
Hi,
Jon Rosebaugh wrote:
I attempted to install lxml 2.0beta1 via easy_install (easy_install lxml==2.0beta1), and it didn't work. After a bunch of experimentation, I discovered that the C files that are supposed to be present in the download were not present. After installing a patched version of Cython 0.9.6.10b (patched according to the directions I found on this list) lxml successfully installed.
Hmm, it shouldn't be that hard. The tgz I downloaded has the .c files, so installing without Cython should work just fine. I just removed my local Cython install and did an "easy_install lxml" (which downloaded, built and installed 2.0beta1) and also an "easy_install lxml-2.0beta1.tar.gz". Both worked just fine.
The tgz linked from the website (http://codespeak.net/lxml/dev/index.html#download -> http://codespeak.net/lxml/dev/lxml-2.0beta1.tgz) gives me a 404, so I used http://cheeseshop.python.org/packages/source/l/lxml/lxml-2.0beta1.tar.gz. When I tried just running 'easy_install lxml' without Cython installed, I got compilation errors which I was able to reproduce yesterday, but not today, so I dunno. Best guess I have is some environmental oddity related to macports that went away when I tried it in a new terminal window. (I tend to re-use the same eight over and over again.)
Maybe you had an older version of Cython installed? If that's found, it will be used - and obviously fail.
Nope, I had never installed it before yesterday.
That one should work, too. I just ran lxmlhtml.txt as doctest (which admittedly wasn't included in the test suite before) and it just worked. Same for test_clean.txt.
What's the version of libxml2 you are using? Can you try running the test suite and see if that works for you?
I used libxml2 2.6.30_0 and libxslt 1.1.22_0, both of which are the latest versions in macports. I tried running the test suite with 'make test' and 'python test.py', and got the same results. test_clean seems to pass, but I got the same strange result as I got yesterday when I try the example in the python interpreter. The test suite fails with 14 errors. jon@euterpe:/tmp/lxml-2.0beta1$ python test.py TESTED VERSION: 2.0.beta1 Python: (2, 5, 1, 'final', 0) lxml.etree: (2, 0, -99, 0) libxml used: (2, 6, 30) libxml compiled: (2, 6, 30) libxslt used: (1, 1, 22) libxslt compiled: (1, 1, 22) ====================================================================== ERROR: test_feed_parser_error_broken (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 3014, in test_feed_parser_error_broken ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_feed_parser_error_close_empty (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 3000, in test_feed_parser_error_close_empty ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_feed_parser_error_close_incomplete (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 3005, in test_feed_parser_error_close_incomplete ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_feed_parser_error_position (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 3028, in test_feed_parser_error_position ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_fromstringlist (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 523, in test_fromstringlist fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_fromstringlist_characters (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 531, in test_fromstringlist_characters fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_fromstringlist_single (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 538, in test_fromstringlist_single fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_iter (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 1467, in test_iter list(a.iter())) AttributeError: _ElementInterface instance has no attribute 'iter' ====================================================================== ERROR: test_parse_encoding_8bit_explicit (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 2605, in test_parse_encoding_8bit_explicit self.assertRaises(self.etree.ParseError, AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_parse_encoding_8bit_override (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 2622, in test_parse_encoding_8bit_override self.assertRaises(self.etree.ParseError, AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_tostring_method_html (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 2376, in test_tostring_method_html tostring(html, method="html")) TypeError: tostring() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_tostring_method_text (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 2393, in test_tostring_method_text tostring(a, method="text")) TypeError: tostring() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_write_method_html (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 737, in test_write_method_html tree.write(f, method="html") TypeError: write() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_write_method_text (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/private/tmp/lxml-2.0beta1/src/lxml/tests/test_elementtree.py", line 759, in test_write_method_text tree.write(f, method="text") TypeError: write() got an unexpected keyword argument 'method' ---------------------------------------------------------------------- Ran 1092 tests in 12.936s FAILED (errors=14)
Hi, Jon Rosebaugh wrote:
The tgz linked from the website (http://codespeak.net/lxml/dev/index.html#download -> http://codespeak.net/lxml/dev/lxml-2.0beta1.tgz) gives me a 404
Ah, thanks. I uploaded it to the /lxml directory and forgot to set a link from /lxml/dev...
When I tried just running 'easy_install lxml' without Cython installed, I got compilation errors which I was able to reproduce yesterday, but not today
:) Well, good to know that it works now. Regarding the missing files, maybe you ran "make clean" somewhere in between your tests, that deletes the .c files (which are generated and usually expected to be in the way when you call "make clean" as a developer).
What's the version of libxml2 you are using? Can you try running the test suite and see if that works for you?
I used libxml2 2.6.30_0 and libxslt 1.1.22_0, both of which are the latest versions in macports.
... and they should work just fine - except for <embed> tags, which are broken in libxml2 2.6.29/30 (and fixed in 2.6.31). But that wasn't your problem here.
I tried running the test suite with 'make test' and 'python test.py', and got the same results. test_clean seems to pass, but I got the same strange result as I got yesterday when I try the example in the python interpreter.
Ah, I think I know what happens. It's the special doctest support for HTML output. To compare the results in the doctest, we parse the expected output with the HTML parser, which also fixes the output that you see in the console and makes it usable HTML. So that keeps us from seeing that the cleanup actually produces garbage... I'll look into it.
The test suite fails with 14 errors. ====================================================================== ERROR: test_feed_parser_error_broken (lxml.tests.test_elementtree.ElementTreeTestCase) [...]
Those are fine, you don't have a suitable ElementTree version installed (lxml 2.0 heads for compatibility with ET 1.3, which is not released yet). I actually thought I had disabled those tests for older ET versions...
participants (2)
-
Jon Rosebaugh
-
Stefan Behnel