[lxml-dev] Can't get lxml to work with Python 3 (installation with errors, importing works, usage doesn't)
Hey developers :) we're already excited about using lxml in our project, though can't quite get it to work. First of all: We (me and another guy) are both running Ubuntu 10.04 x64 and Python 3. It began when easy_installing the package: -- $ sudo easy_install3 lxml Searching for lxml Reading http://pypi.python.org/simple/lxml/ Reading http://codespeak.net/lxml Best match: lxml 2.2.6 Downloading http://codespeak.net/lxml/lxml-2.2.6.tgz Processing lxml-2.2.6.tgz Running lxml-2.2.6/setup.py -q bdist_egg --dist-dir /tmp/easy_install-X0lzwT/lxml-2.2.6/egg-dist-tmp-Sv2xWe Building lxml version 2.2.6. NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c' needs to be available. Using build configuration of libxslt 1.1.26 Building against libxml2/libxslt in the following directory: /usr/lib File "build/bdist.linux-x86_64/egg/lxml/html/_diffcommand.py", line 37 print 'Error: you must give two files' ^ SyntaxError: invalid syntax File "build/bdist.linux-x86_64/egg/lxml/html/_html5builder.py", line 80 root = html.fromstring(u''.join(buf)) ^ SyntaxError: invalid syntax File "/usr/local/lib/python3.1/dist-packages/lxml-2.2.6-py3.1-linux-x86_64.egg/lxml/html/_diffcommand.py", line 37 print 'Error: you must give two files' ^ SyntaxError: invalid syntax File "/usr/local/lib/python3.1/dist-packages/lxml-2.2.6-py3.1-linux-x86_64.egg/lxml/html/_html5builder.py", line 80 root = html.fromstring(u''.join(buf)) ^ SyntaxError: invalid syntax Adding lxml 2.2.6 to easy-install.pth file Installed /usr/local/lib/python3.1/dist-packages/lxml-2.2.6-py3.1-linux-x86_64.egg Processing dependencies for lxml Finished processing dependencies for lxml -- It looks like python (v3?) is complaining about v2 features? Anyway, importing works (at least, it seems so): $ python3
from lxml import etree
However, we can't get our code to work: -- import socketserver from lxml import etree #from xml.etree import ElementTree as etree import settings class CloudiaServer(socketserver.ThreadingMixIn, socketserver.TCPServer): pass class RequestHandler(socketserver.StreamRequestHandler): def handle(self): for event, element in etree.iterparse(self.rfile): if event != 'end': continue print(element.tag) print(str(element.attrib)) print(element.text) element.clean() if __name__ == '__main__': server = CloudiaServer((settings.host, settings.port), RequestHandler) server.serve_forever() -- As soon as our client connects and sends b'<foo bar="quux">text</foo>', the server process outputs: -- Exception happened during processing of request from ('127.0.0.1', 39620) Traceback (most recent call last): File "/usr/lib/python3.1/socketserver.py", line 558, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python3.1/socketserver.py", line 320, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python3.1/socketserver.py", line 614, in __init__ self.handle() File "/home/codethief/projekte/cloudia/server/src/server.py", line 13, in handle for event, element in etree.iterparse(self.rfile): File "iterparse.pxi", line 378, in lxml.etree.iterparse.__init__ (src/lxml/lxml.etree.c:85769) File "apihelpers.pxi", line 1503, in lxml.etree._getFilenameForFile (src/lxml/lxml.etree.c:22366) File "/usr/lib/python3.1/posixpath.py", line 363, in abspath if not isabs(path): File "/usr/lib/python3.1/posixpath.py", line 61, in isabs return s.startswith(sep) AttributeError: 'int' object has no attribute 'startswith' -- Does this maybe have something to do with essentially parsing a byte stream? -- Simon Hirscher http://simonhirscher.de
Excuse me, I forgot to add that replacing lxml with xml.etree, which can be seen at the beginning of the code, works like a charm. (Of course, only if you comment out the element.clean() line.) -- Simon Hirscher http://simonhirscher.de
codethief, 21.05.2010 01:35:
-- $ sudo easy_install3 lxml [...] Building against libxml2/libxslt in the following directory: /usr/lib File "build/bdist.linux-x86_64/egg/lxml/html/_diffcommand.py", line 37 print 'Error: you must give two files' ^ SyntaxError: invalid syntax
File "build/bdist.linux-x86_64/egg/lxml/html/_html5builder.py", line 80 root = html.fromstring(u''.join(buf)) ^ SyntaxError: invalid syntax
Hmmm, looks like these two files are not used in the test suite. Anyway, unless you really want to use them, you'll be fine with all the rest.
for event, element in etree.iterparse(self.rfile): [...] File "iterparse.pxi", line 378, in lxml.etree.iterparse.__init__ (src/lxml/lxml.etree.c:85769) File "apihelpers.pxi", line 1503, in lxml.etree._getFilenameForFile (src/lxml/lxml.etree.c:22366) File "/usr/lib/python3.1/posixpath.py", line 363, in abspath if not isabs(path): File "/usr/lib/python3.1/posixpath.py", line 61, in isabs return s.startswith(sep) AttributeError: 'int' object has no attribute 'startswith'
That's because you are passing a byte string as path. Might be a bug in Py3, don't know. Passing a unicode string instead will help.
Does this maybe have something to do with essentially parsing a byte stream?
Unless "self.rfile" *is* the byte stream - no. XML is a sequence of bytes anyway. In case you mixed up the interface and self.rfile actually contains the data instead of a filename, wrap it in a BytesIO instance. Stefan
participants (2)
-
codethief -
Stefan Behnel