ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

Stefan Behnel stefan_ml at behnel.de
Sat Jun 27 02:06:01 EDT 2009


Kee Nethery wrote:
> On Jun 25, 2009, at 11:39 PM, Stefan Behnel wrote:
>> parsing a
>> document from a string does not have its own function, because it is
>> trivial to write
>>
>>     tree = parse(BytesIO(some_byte_string))
> 
> :-) Trivial for someone familiar with the language. For a newbie like
> me, that step was non-obvious.

I actually meant the code complexity, not the fact that you need to know
BytesIO to do the above.


>> If what you meant is actually parsing from a byte string, this is easily
>> done using BytesIO(), or StringIO() in Py2.x (x<6).
> 
> Yes, thanks! Looks like BytesIO is a v.3.x enhancement.

It should be available in 2.6 AFAIR, simply as an alias for StringIO.


> Looks like the
> StringIO does what I need since all I'm doing is pulling the unicode
> string into et.parse.

As I said, this won't work, unless you are either

a) passing a unicode string with plain ASCII characters in Py2.x
or
b) confusing UTF-8 and Unicode


>>> theXmlDataTree =
>> et.parse(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))
>>
>> This will not work because ET cannot parse from unicode strings (unless
>> they only contain plain ASCII characters and you happen to be using
>> Python
>> 2.x). lxml can parse from unicode strings, but it requires that the XML
>> must not have an encoding declaration (which would render it non
>> well-formed). This is convenient for parsing HTML, it's less
>> convenient for XML usually.
> 
> Right for my example, if the data is coming in as UTF-8 I believe I can do:
>    theXmlDataTree = et.parse(StringIO.StringIO(theXmlData), encoding
> ='utf-8')

Yes, although in this case you are not parsing a unicode string but a UTF-8
encoded byte string. Plus, passing 'UTF-8' as encoding to the parser is
redundant, as it is the default for XML.

Stefan



More information about the Python-list mailing list