[lxml-dev] Can't get lxml to work with Python 3 (installation with errors, importing works, usage doesn't)
Hmmm, looks like these two files are not used in the test suite.
Anyway, unless you really want to use them, you'll be fine with all the rest.
So, are those two files actually part of the test suite or do they belong to the library and just aren't tested in the test suite for Python 3? Either way, this is a bug which should be reported, isn't it?
That's because you are passing a byte string as path. Might be a bug in Py3, don't know. Passing a unicode string instead will help.
Does this maybe have something to do with essentially parsing a byte stream?
Unless "self.rfile" *is* the byte stream - no. XML is a sequence of bytes anyway. In case you mixed up the interface and self.rfile actually contains the data instead of a filename, wrap it in a BytesIO instance.
Path? Filename? Doesn't iterparse expect a file-like object to read() from? See http://codespeak.net/lxml/tutorial.html#event-driven-parsing. self.rfile is such a file-like object with read access to the bytes coming in through the socket (as described here: http://docs.python.org/release/3.1.1/library/socketserver.html). Anyway, xml.etree does its thing perfectly here and takes the bytes stream representing ASCII characters (as of now, will be UTF-8 later on) without complaining (but we need proper namespace support so it's not an option). As lxml promises to be 99% compatible to xml.etree, something must be wrong here. Should I file a bug report? -- Simon Hirscher http://simonhirscher.de
codethief, 21.05.2010 12:28:
Hmmm, looks like these two files are not used in the test suite.
Anyway, unless you really want to use them, you'll be fine with all the rest.
So, are those two files actually part of the test suite or do they belong to the library and just aren't tested in the test suite for Python 3?
The latter.
Either way, this is a bug which should be reported, isn't it?
Sure, thanks for doing that.
That's because you are passing a byte string as path. Might be a bug in Py3, don't know. Passing a unicode string instead will help.
Does this maybe have something to do with essentially parsing a byte stream?
Unless "self.rfile" *is* the byte stream - no. XML is a sequence of bytes anyway. In case you mixed up the interface and self.rfile actually contains the data instead of a filename, wrap it in a BytesIO instance.
Path? Filename? Doesn't iterparse expect a file-like object to read() from?
Both.
self.rfile is such a file-like object with read access to the bytes coming in through the socket (as described here: http://docs.python.org/release/3.1.1/library/socketserver.html).
Ok, so I misinterpreted the error trace. The respective code in lxml.etree needs to be somewhat flexible and ended up a bit too fragile. So, if you pass a "non-standard" file-like object, it might end up raising an exception like this. Here's a fix. Take care to use Cython 0.12.1 when you build from patched lxml 2.2.x sources. === src/lxml/apihelpers.pxi ================================================================== --- src/lxml/apihelpers.pxi (revision 5591) +++ src/lxml/apihelpers.pxi (local) @@ -1571,17 +1571,24 @@ Returns None if not a file object. """ - # file instances have a name attribute - filename = getattr(source, u'name', None) - if filename is not None: - return os_path_abspath(filename) # urllib2 provides a geturl() method - geturl = getattr(source, u'geturl', None) - if geturl is not None: - return geturl() + try: + return source.geturl() + except: + pass # gzip file instances have a filename attribute (before Py3k) - filename = getattr(source, u'filename', None) - if filename is not None: - return os_path_abspath(filename) + try: + filename = source.filename + if _isString(filename): + return os_path_abspath(filename) + except: + pass + # file instances have a name attribute + try: + filename = source.name + if _isString(filename): + return os_path_abspath(filename) + except: + pass # can't determine filename return None Stefan
participants (2)
-
codethief -
Stefan Behnel