Re: [lxml-dev] Failing lxml.html tests

(I had sent this to Stefan, but maybe someone on the list will recognize something about what's going on here) Stefan Behnel wrote:
There were some other lxml eggs on the path, but I think the trunk was above them since I'd just run "python setup.py develop". I tried python setup.py install, just in case there was something weird about develop, but same thing (either way it seems to be running out of the checkout, so presumably test.py is fixing up the path somehow). Here's the version information test.py gave: webdev$ python test.py TESTED VERSION: 2.0.alpha4-47900 Python: (2, 4, 4, 'candidate', 1) lxml.etree: (2, 0, -196, 47900) libxml used: (2, 6, 26) libxml compiled: (2, 6, 26) libxslt used: (1, 1, 17) libxslt compiled: (1, 1, 17) -- Ian Bicking : ianb@colorstudy.com : http://blog.ianbicking.org

-------- Original-Nachricht --------
Stefan said something about activating keyword-only arguments functionality (which raises TypeErrors) as Python 3 introduces. Could this be it? Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, Ian Bicking wrote:
That version contains every relevant change, and the current trunk works for me in Python 2.4.4. I can't see what might have gone wrong here. To see if it's the kw-only arguments, you can remove the '*' from the argument list in ElementTree.write() and retest. Stefan

Stefan Behnel wrote:
I made a new checkout, did python setup.py develop, and retested, and the errors seem even weirder now. Many are for method, but there's a bunch of others too (though still most pass). I attached the test output. -- Ian Bicking : ianb@colorstudy.com : http://blog.ianbicking.org ====================================================================== ERROR: /home/ianb/src/lxml/src/lxml/html/tests/feedparser-data/entry_content_applet.data ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/html/tests/test_feedparser_data.py", line 60, in runTest transformed = Cleaner(**kw).clean_html(self.input) File "/home/ianb/src/lxml/src/lxml/html/clean.py", line 445, in clean_html self(doc) File "/home/ianb/src/lxml/src/lxml/html/clean.py", line 317, in __call__ if self.allow_element(el): File "/home/ianb/src/lxml/src/lxml/html/clean.py", line 371, in allow_element url = el.get(self._tag_link_attrs[el.tag]) File "lxml.etree.pyx", line 960, in lxml.etree._Element.get File "apihelpers.pxi", line 248, in lxml.etree._getAttributeValue File "apihelpers.pxi", line 1024, in lxml.etree._getNsTag File "apihelpers.pxi", line 971, in lxml.etree._utf8 TypeError: Argument must be string or unicode. ====================================================================== ERROR: test_feed_parser (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2987, in test_feed_parser parser = self.etree.XMLParser() AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_feed_parser_error_broken (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3017, in test_feed_parser_error_broken ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_feed_parser_error_close_empty (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3003, in test_feed_parser_error_close_empty ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_feed_parser_error_close_incomplete (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3008, in test_feed_parser_error_close_incomplete ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_fromstringlist (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 534, in test_fromstringlist fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_fromstringlist_characters (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 542, in test_fromstringlist_characters fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_fromstringlist_single (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 549, in test_fromstringlist_single fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_parse_encoding_8bit_explicit (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2603, in test_parse_encoding_8bit_explicit XMLParser = self.etree.XMLParser AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parse_encoding_8bit_override (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2618, in test_parse_encoding_8bit_override XMLParser = self.etree.XMLParser AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_target_attrib (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3071, in test_parser_target_attrib parser = self.etree.XMLParser(target=Target()) AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_target_data (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3095, in test_parser_target_data parser = self.etree.XMLParser(target=Target()) AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_target_tag (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3048, in test_parser_target_tag parser = self.etree.XMLParser(target=Target()) AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_version (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2979, in test_parser_version parser = etree.XMLParser() AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_tostring_method_html (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2377, in test_tostring_method_html tostring(html, method="html")) TypeError: tostring() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_tostring_method_text (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2394, in test_tostring_method_text tostring(a, method="text")) TypeError: tostring() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_write_method_html (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 755, in test_write_method_html tree.write(f, method="html") TypeError: write() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_write_method_text (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 777, in test_write_method_text tree.write(f, method="text") TypeError: write() got an unexpected keyword argument 'method' ---------------------------------------------------------------------- Ran 1058 tests in 2.127s FAILED (errors=18) TESTED VERSION: 2.0.alpha5-48163 Python: (2, 4, 4, 'candidate', 1) lxml.etree: (2, 0, -195, 48163) libxml used: (2, 6, 26) libxml compiled: (2, 6, 26) libxslt used: (1, 1, 17) libxslt compiled: (1, 1, 17)

Ian Bicking wrote:
Hmm, there really must be something wrong with your setup. You have Cython 0.9.6.7 installed, I assume? I only get three errors, all in the HTML tests. The first one is because one of the entries in _tag_link_attrs is a list, not sure about the others. Anyway, you can run the HTML tests by calling "test.py -vv html", that should get you over the failing tests for now. I'll see how far I get with a clean checkout myself. Have you tried importing etree by hand and checked if the failing methods work there? Stefan ====================================================================== ERROR: /home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/feedparser-data/entry_content_applet.data ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_feedparser_data.py", line 60, in runTest transformed = Cleaner(**kw).clean_html(self.input) File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py", line 445, in clean_html self(doc) File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py", line 317, in __call__ if self.allow_element(el): File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py", line 371, in allow_element url = el.get(self._tag_link_attrs[el.tag]) File "lxml.etree.pyx", line 965, in lxml.etree._Element.get File "apihelpers.pxi", line 248, in lxml.etree._getAttributeValue File "apihelpers.pxi", line 1024, in lxml.etree._getNsTag File "apihelpers.pxi", line 971, in lxml.etree._utf8 TypeError: Argument must be string or unicode. ====================================================================== FAIL: Doctest: test_clean.txt ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "doctest.py", line 2112, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for test_clean.txt File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt", line 0 ---------------------------------------------------------------------- File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt", line 127, in test_clean.txt Failed example: print tostring(fromstring(doc_embed)) Expected: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"></embed> <embed src="http://anothersite.com/v/another"></embed> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </div> </body> </html> Got: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> <embed src="http://anothersite.com/v/another"> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </embed> </embed> </div> </body> </html> Diff: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> -<embed src="http://anothersite.com/v/another"> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </embed> </embed> +<embed src="http://anothersite.com/v/another"></embed> +<script src="http://www.youtube.com/example.js"></script> +<script src="/something-else.js"></script> </div> </body> </html> ---------------------------------------------------------------------- File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt", line 141, in test_clean.txt Failed example: print Cleaner(host_whitelist=['www.youtube.com'], whitelist_tags=None).clean_html(doc_embed) Expected: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"></embed> <script src="http://www.youtube.com/example.js"></script> </div> </body> </html> Got: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> <script src="http://www.youtube.com/example.js"></script> </embed> </div> </body> </html> Diff: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> -<script src="http://www.youtube.com/example.js"></script> </embed> +<script src="http://www.youtube.com/example.js"></script> </div> </body> </html>

Stefan Behnel wrote:
I only get one, the _tag_link_attrs issue, which I just fixed. It's possible one of these weird errors is preventing another error from occurring, though... I guess not, since all the errors I now get are in lxml.tests.test_elementtree.
They do work there (at least the method argument that I tested). So it's just in the test environment where it's acting weird. Which is odd. -- Ian Bicking : ianb@colorstudy.com : http://blog.ianbicking.org

Ian Bicking wrote:
The others only occur with libxml2 2.6.29 and later. These versions handle the "embed" tag as a special tag that does not need closing. However, a parse-serialise-parse cycle for such HTML alters the document here: it omits the closing tag and then reparses the following tags as children. So this is a bug in libxml2. I'll report it there. For the time being - maybe there's a way to work around that? Stefan

-------- Original-Nachricht --------
Stefan said something about activating keyword-only arguments functionality (which raises TypeErrors) as Python 3 introduces. Could this be it? Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, Ian Bicking wrote:
That version contains every relevant change, and the current trunk works for me in Python 2.4.4. I can't see what might have gone wrong here. To see if it's the kw-only arguments, you can remove the '*' from the argument list in ElementTree.write() and retest. Stefan

Stefan Behnel wrote:
I made a new checkout, did python setup.py develop, and retested, and the errors seem even weirder now. Many are for method, but there's a bunch of others too (though still most pass). I attached the test output. -- Ian Bicking : ianb@colorstudy.com : http://blog.ianbicking.org ====================================================================== ERROR: /home/ianb/src/lxml/src/lxml/html/tests/feedparser-data/entry_content_applet.data ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/html/tests/test_feedparser_data.py", line 60, in runTest transformed = Cleaner(**kw).clean_html(self.input) File "/home/ianb/src/lxml/src/lxml/html/clean.py", line 445, in clean_html self(doc) File "/home/ianb/src/lxml/src/lxml/html/clean.py", line 317, in __call__ if self.allow_element(el): File "/home/ianb/src/lxml/src/lxml/html/clean.py", line 371, in allow_element url = el.get(self._tag_link_attrs[el.tag]) File "lxml.etree.pyx", line 960, in lxml.etree._Element.get File "apihelpers.pxi", line 248, in lxml.etree._getAttributeValue File "apihelpers.pxi", line 1024, in lxml.etree._getNsTag File "apihelpers.pxi", line 971, in lxml.etree._utf8 TypeError: Argument must be string or unicode. ====================================================================== ERROR: test_feed_parser (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2987, in test_feed_parser parser = self.etree.XMLParser() AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_feed_parser_error_broken (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3017, in test_feed_parser_error_broken ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_feed_parser_error_close_empty (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3003, in test_feed_parser_error_close_empty ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_feed_parser_error_close_incomplete (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3008, in test_feed_parser_error_close_incomplete ParseError = self.etree.ParseError AttributeError: 'module' object has no attribute 'ParseError' ====================================================================== ERROR: test_fromstringlist (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 534, in test_fromstringlist fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_fromstringlist_characters (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 542, in test_fromstringlist_characters fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_fromstringlist_single (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 549, in test_fromstringlist_single fromstringlist = self.etree.fromstringlist AttributeError: 'module' object has no attribute 'fromstringlist' ====================================================================== ERROR: test_parse_encoding_8bit_explicit (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2603, in test_parse_encoding_8bit_explicit XMLParser = self.etree.XMLParser AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parse_encoding_8bit_override (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2618, in test_parse_encoding_8bit_override XMLParser = self.etree.XMLParser AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_target_attrib (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3071, in test_parser_target_attrib parser = self.etree.XMLParser(target=Target()) AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_target_data (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3095, in test_parser_target_data parser = self.etree.XMLParser(target=Target()) AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_target_tag (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 3048, in test_parser_target_tag parser = self.etree.XMLParser(target=Target()) AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_parser_version (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2979, in test_parser_version parser = etree.XMLParser() AttributeError: 'module' object has no attribute 'XMLParser' ====================================================================== ERROR: test_tostring_method_html (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2377, in test_tostring_method_html tostring(html, method="html")) TypeError: tostring() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_tostring_method_text (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 2394, in test_tostring_method_text tostring(a, method="text")) TypeError: tostring() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_write_method_html (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 755, in test_write_method_html tree.write(f, method="html") TypeError: write() got an unexpected keyword argument 'method' ====================================================================== ERROR: test_write_method_text (lxml.tests.test_elementtree.ElementTreeTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/ianb/src/lxml/src/lxml/tests/test_elementtree.py", line 777, in test_write_method_text tree.write(f, method="text") TypeError: write() got an unexpected keyword argument 'method' ---------------------------------------------------------------------- Ran 1058 tests in 2.127s FAILED (errors=18) TESTED VERSION: 2.0.alpha5-48163 Python: (2, 4, 4, 'candidate', 1) lxml.etree: (2, 0, -195, 48163) libxml used: (2, 6, 26) libxml compiled: (2, 6, 26) libxslt used: (1, 1, 17) libxslt compiled: (1, 1, 17)

Ian Bicking wrote:
Hmm, there really must be something wrong with your setup. You have Cython 0.9.6.7 installed, I assume? I only get three errors, all in the HTML tests. The first one is because one of the entries in _tag_link_attrs is a list, not sure about the others. Anyway, you can run the HTML tests by calling "test.py -vv html", that should get you over the failing tests for now. I'll see how far I get with a clean checkout myself. Have you tried importing etree by hand and checked if the failing methods work there? Stefan ====================================================================== ERROR: /home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/feedparser-data/entry_content_applet.data ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_feedparser_data.py", line 60, in runTest transformed = Cleaner(**kw).clean_html(self.input) File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py", line 445, in clean_html self(doc) File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py", line 317, in __call__ if self.allow_element(el): File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/clean.py", line 371, in allow_element url = el.get(self._tag_link_attrs[el.tag]) File "lxml.etree.pyx", line 965, in lxml.etree._Element.get File "apihelpers.pxi", line 248, in lxml.etree._getAttributeValue File "apihelpers.pxi", line 1024, in lxml.etree._getNsTag File "apihelpers.pxi", line 971, in lxml.etree._utf8 TypeError: Argument must be string or unicode. ====================================================================== FAIL: Doctest: test_clean.txt ---------------------------------------------------------------------- Traceback (most recent call last): File "unittest.py", line 260, in run testMethod() File "doctest.py", line 2112, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for test_clean.txt File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt", line 0 ---------------------------------------------------------------------- File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt", line 127, in test_clean.txt Failed example: print tostring(fromstring(doc_embed)) Expected: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"></embed> <embed src="http://anothersite.com/v/another"></embed> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </div> </body> </html> Got: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> <embed src="http://anothersite.com/v/another"> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </embed> </embed> </div> </body> </html> Diff: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> -<embed src="http://anothersite.com/v/another"> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </embed> </embed> +<embed src="http://anothersite.com/v/another"></embed> +<script src="http://www.youtube.com/example.js"></script> +<script src="/something-else.js"></script> </div> </body> </html> ---------------------------------------------------------------------- File "/home/sbehnel/source/Python/lxml/lxml-HEAD/src/lxml/html/tests/test_clean.txt", line 141, in test_clean.txt Failed example: print Cleaner(host_whitelist=['www.youtube.com'], whitelist_tags=None).clean_html(doc_embed) Expected: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"></embed> <script src="http://www.youtube.com/example.js"></script> </div> </body> </html> Got: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> <script src="http://www.youtube.com/example.js"></script> </embed> </div> </body> </html> Diff: <html> <body> <div> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"> -<script src="http://www.youtube.com/example.js"></script> </embed> +<script src="http://www.youtube.com/example.js"></script> </div> </body> </html>

Stefan Behnel wrote:
I only get one, the _tag_link_attrs issue, which I just fixed. It's possible one of these weird errors is preventing another error from occurring, though... I guess not, since all the errors I now get are in lxml.tests.test_elementtree.
They do work there (at least the method argument that I tested). So it's just in the test environment where it's acting weird. Which is odd. -- Ian Bicking : ianb@colorstudy.com : http://blog.ianbicking.org

Ian Bicking wrote:
The others only occur with libxml2 2.6.29 and later. These versions handle the "embed" tag as a special tag that does not need closing. However, a parse-serialise-parse cycle for such HTML alters the document here: it omits the closing tag and then reparses the following tags as children. So this is a bug in libxml2. I'll report it there. For the time being - maybe there's a way to work around that? Stefan
participants (3)
-
Ian Bicking
-
jholg@gmx.de
-
Stefan Behnel