ElementTree.XML(string XML) and ElementTree.fromstring(string XML) not working

Stefan Behnel stefan_ml at behnel.de
Fri Jun 26 08:39:33 CEST 2009


Hi,

Kee Nethery wrote:
> Why isn't et.parse the only way to do this? Why have XML or fromstring
> at all?

Well, use cases. XML() is an alias for fromstring(), because it's
convenient (and well readable) to write

   section = XML('<section id="XYZ"><title>A to Z</title></section>')
   section.append(paragraphs)

for XML literals in source code. fromstring() is there because when you
want to parse a fragment from a string that you got from whatever source,
it's easy to express that with exactly that function, as in

	el = fromstring(some_string)

If you want to parse a document from a file or file-like object, use
parse(). Three use cases, three functions. The fourth use case of parsing a
document from a string does not have its own function, because it is
trivial to write

	tree = parse(BytesIO(some_byte_string))

I do not argue that fromstring() should necessarily return an Element, as
parsing fragments is more likely for literals than for strings that come
from somewhere else. However, given that the use case of parsing a document
from a string is so easily handled with parse(), I find it ok to give the
second use case its own function, simply because

	tree = fromstring(some_string)
	fragment_top_element = tree.getroot()

absolutely does not catch it.


> Why not enhance parse and deprecate XML and fromstring with
> something like:
>
> formPostData = cgi.FieldStorage()
> theXmlData = formPostData['theXml'].value
> theXmlDataTree =
et.parse(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData))

This will not work because ET cannot parse from unicode strings (unless
they only contain plain ASCII characters and you happen to be using Python
2.x). lxml can parse from unicode strings, but it requires that the XML
must not have an encoding declaration (which would render it non
well-formed). This is convenient for parsing HTML, it's less convenient for
XML usually.

If what you meant is actually parsing from a byte string, this is easily
done using BytesIO(), or StringIO() in Py2.x (x<6).

Stefan



More information about the Python-list mailing list