data:image/s3,"s3://crabby-images/f9b4c/f9b4c664d0fae46a4e662916e1666c59b926d8e9" alt=""
Hello, I run into some issues when trying to parse a UTF8-BOM file (Python 2.7). It was working fine until version 3.2.5 but it is not starting from version 3.3.0-beta1. This is the error I've been getting when trying to do etree.iterparse(path, tag='item'): File "iterparse.pxi", line 166, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:116372) XMLSyntaxError: Document is empty, line 1, column 1 I had a look at tests/test_elementtree.py and saw that it's different from what it used to be years ago: def test_encoding_utf8_bom(self): utext = _str('Søk på nettet') uxml = (_str('<?xml version="1.0" encoding="UTF-8"?>') + _str('<p>%s</p>') % utext) bom = _bytes('\\xEF\\xBB\\xBF').decode("unicode_escape").encode("latin1") xml = bom + uxml.encode("utf-8") tree = etree.XML(xml) self.assertEqual(utext, tree.text) In the mailing list I only managed to find this thread: http://article.gmane.org/gmane.comp.python.lxml.devel/2967/match=bom but it's not relevant because it's from 2007. That said, lxml is amazing :) Thank you, Stefano-