Ignoring XML Namespaces with cElementTree
Carl Banks
pavlovevidence at gmail.com
Sat May 1 06:33:57 EDT 2010
On Apr 29, 10:12 pm, Stefan Behnel <stefan... at behnel.de> wrote:
> dmtr, 30.04.2010 04:57:
>
>
>
> > I'm referring to xmlns/URI prefixes. Here's a code example:
> > from xml.etree.cElementTree import iterparse
> > from cStringIO import StringIO
> > xml = """<root xmlns="http://www.very_long_url.com"><child/></
> > root>"""
> > for event, elem in iterparse(StringIO(xml)): print event, elem
>
> > The output is:
> > end<Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
> > end<Element '{http://www.very_long_url.com}root' at 0xb7ddfa40>
>
> > I don't want these "{http://www.very_long_url.com}" in front of my
> > tags.
>
> > They create performance disaster on large files
>
> I seriously doubt that they do.
I don't know what kind of XML files you deal with, but for me a large
XML file is gigabyte-sized (obviously I don't use Element Tree for
those).
Even for files tens-of-megabyte files string ops to expand tags with
namespaces is going to be a pretty decent penalty--remember
ElementTree does nothing lazily.
> > (first cElementTree
> > adds them, then I have to remove them in python).
>
> I think that's your main mistake: don't remove them. Instead, use the fully
> qualified names when comparing.
Unless you have multiple namespaces or are working with defined schema
or something, it's useless boilerplate.
It'd be a nice feature if ElementTree could let users optionally
ignore a namespace, unfortunately it doesn't have it.
Carl Banks
More information about the Python-list
mailing list