[lxml-dev] decoding unicode strings

hi, the lxml.etree.XMLID function does not accept unicode strings when the xml declaration tag is present at the beginning of the xml document. however, not all soap clients send the xml declaration, so sometimes i must rely on information in http headers to decode the string. my solution was this: try: root, xmlids = etree.XMLID(xml_string.decode(http_charset)) except ValueError,e: logger.debug('%s -- falling back to str decoding.' % (e)) root, xmlids = etree.XMLID(xml_string) is this the proper way to check whether an xml document candidate has an xml declaration at the beginning? thanks, burak

Burak Arslan, 29.07.2010 10:57:
The correct way to do it is to pass a parser that uses an explicitly defined encoding. However, this parameter is currently missing from the XMLID() functions. You can use parseid() instead, which accepts this argument. I also fixed this in SVN so that the upcoming 2.3 release will support the 'parser' parameter in XMLID() as well. Stefan

Burak Arslan, 29.07.2010 10:57:
The correct way to do it is to pass a parser that uses an explicitly defined encoding. However, this parameter is currently missing from the XMLID() functions. You can use parseid() instead, which accepts this argument. I also fixed this in SVN so that the upcoming 2.3 release will support the 'parser' parameter in XMLID() as well. Stefan
participants (2)
-
Burak Arslan
-
Stefan Behnel