[lxml-dev] iterparse and namespaces
Hi, I've got small problem with iterparse and namespaces. I have this code:
from StringIO import StringIO from lxml import etree print etree.__version__ 2.1.5 xml = """<root xmlns="http://www.example.com"><a>1</a><a>2</a></root>""" a1 = etree.iterparse(StringIO(xml), tag="a") a1.next() Traceback (most recent call last): File "<stdin>", line 1, in ? File "iterparse.pxi", line 515, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:77014) StopIteration a2 = etree.iterparse(StringIO(xml), tag="{http://www.example.com}a") a2.next() (u'end', <Element {http://www.example.com}a at -488456c4>)
Is that possible to get all tags "a" without passing namespace? I mean, a2 works, but could it be possible to make a1 working too? regards Piotr Furman
Piotr Furman wrote:
I've got small problem with iterparse and namespaces. I have this code:
from StringIO import StringIO from lxml import etree print etree.__version__ 2.1.5 xml = """<root xmlns="http://www.example.com"><a>1</a><a>2</a></root>""" a1 = etree.iterparse(StringIO(xml), tag="a") a1.next() Traceback (most recent call last): File "<stdin>", line 1, in ? File "iterparse.pxi", line 515, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:77014) StopIteration a2 = etree.iterparse(StringIO(xml), tag="{http://www.example.com}a") a2.next() (u'end', <Element {http://www.example.com}a at -488456c4>)
Is that possible to get all tags "a" without passing namespace? I mean, a2 works, but could it be possible to make a1 working too?
..., tag="{*}a" might work. Stefan
Hi,
Is that possible to get all tags "a" without passing namespace? I mean, a2 works, but could it be possible to make a1 working too?
..., tag="{*}a"
might work.
I thought so too but it doesn't :)
for e in etree.iterparse(StringIO(xml), tag="{*}a"): print e ...
The wildcard works for element names, not for namespaces:
for e in etree.iterparse(StringIO(xml), tag="{http://www.example.com}*"): print e ... (u'end', <Element {http://www.example.com}a at 2d7510>) (u'end', <Element {http://www.example.com}a at 2d7570>) (u'end', <Element {http://www.example.com}root at 2675a0>)
Same holds for .iter(). Is there a usecase for "give me every element x from whatever namespace"? Holger -- Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt? Jetzt informieren und absichern: https://homebanking.web.de/?mc=mail@footer.
jholg@gmx.de wrote:
Is that possible to get all tags "a" without passing namespace? I mean, a2 works, but could it be possible to make a1 working too?
..., tag="{*}a"
might work.
I thought so too but it doesn't :)
Thanks for checking. :)
The wildcard works for element names, not for namespaces:
for e in etree.iterparse(StringIO(xml), tag="{http://www.example.com}*"): print e ... (u'end', <Element {http://www.example.com}a at 2d7510>) (u'end', <Element {http://www.example.com}a at 2d7570>) (u'end', <Element {http://www.example.com}root at 2675a0>)
Same holds for .iter().
Is there a usecase for "give me every element x from whatever namespace"?
Yep, I guess that's why it doesn't work. :) Maybe the OP can give us a clearer idea about the background of this request. Stefan
On Tue, Mar 17, 2009 at 03:48:54PM +0100, Stefan Behnel wrote:
jholg@gmx.de wrote:
Is there a usecase for "give me every element x from whatever namespace"?
Yep, I guess that's why it doesn't work. :)
While I realize this is a a different problem than the original poster had... Suppose you have to interoperate with an XML generator that changes the namespace based on an unrelated version number of the supporting platform and you had no way of knowing what namespaces a document would use? Further, you do know that the sematic content of the tags is unchanged. That situation led me to want an XML toolkit that would let me throw away namespace data - because stupid people have done stupid things with XML namespaces. And I have to live with it, whether it's right or not. I ended up solving the problem by search-and-replacing an XSLT sheet with heuristically gleaned version information and using that XSLT to create data structures I could actually do something with. Poetic justice, I'd say, that XML's structured approach can lead to a problem solvable only by ad-hoc parsering of a serialized XML doc :) (Though in retrospect, I think I could use lxml's nsmap members to glean the namespace information build unversioned data structures without the really ugly intermediate transform) -- Ross Vandegrift ross@kallisti.us "If the fight gets hot, the songs get hotter. If the going gets tough, the songs get tougher." --Woody Guthrie
<jholg <at> gmx.de> writes:
Is there a usecase for "give me every element x from whatever namespace"?
Holger
Thanks for answer, my use case is that I have a xml file with only one namespace defined in root. I guess that if there were more namespaces in one file it wouldn't make sense, but as long as it's only one I just don't care about that and would have all specified elements. So I have at least two choices, either remove xmlns from files, or iterate over all elements and filter out those I don't need. PF
Piotr Furman wrote:
<jholg <at> gmx.de> writes:
Is there a usecase for "give me every element x from whatever namespace"?
Thanks for answer, my use case is that I have a xml file with only one namespace defined in root. I guess that if there were more namespaces in one file it wouldn't make sense, but as long as it's only one I just don't care about that and would have all specified elements.
uh? Then I really don't get it. If there is only one namespace that contains all elements, then why can't you just look for the tags in exactly that namespace? That will give you all tags with that name. Stefan
Stefan Behnel <stefan_ml <at> behnel.de> writes:
Piotr Furman wrote:
<jholg <at> gmx.de> writes:
Is there a usecase for "give me every element x from whatever namespace"?
Thanks for answer, my use case is that I have a xml file with only one namespace defined in root. I guess that if there were more namespaces in one file it wouldn't make sense, but as long as it's only one I just don't care about that and would have all specified elements.
uh? Then I really don't get it. If there is only one namespace that contains all elements, then why can't you just look for the tags in exactly that namespace? That will give you all tags with that name.
Stefan
Sure I can, but my real data is little bigger than this sample. Each element found with iterparse has many other tags I'd like to retrieve, using "iter" method. Here again, for each tag I would have to add namespace. If I do this in several lines code will be ugly. It would be also harder to maintain, if for some reason somebody would change xmlns one day. So it would be nice if iterparse could accept wildcard as namespace, but I see it can be solved another way. PF
Piotr Furman wrote:
Stefan Behnel <stefan_ml <at> behnel.de> writes:
Piotr Furman wrote:
<jholg <at> gmx.de> writes:
Is there a usecase for "give me every element x from whatever namespace"?
Thanks for answer, my use case is that I have a xml file with only one namespace defined in root. I guess that if there were more namespaces in one file it wouldn't make sense, but as long as it's only one I just don't care about that and would have all specified elements.
uh? Then I really don't get it. If there is only one namespace that contains all elements, then why can't you just look for the tags in exactly that namespace? That will give you all tags with that name.
Stefan
Sure I can, but my real data is little bigger than this sample. Each element found with iterparse has many other tags I'd like to retrieve, using "iter" method. Here again, for each tag I would have to add namespace. If I do this in several lines code will be ugly. It would be also harder to maintain, if for some reason somebody would change xmlns one day.
So it would be nice if iterparse could accept wildcard as namespace
Note that this would not give you a namespace-free tag name on the element, so you'd still have to use qualified names in a couple of places. It's really best to assign the qualified names to variables and to work with those. Stefan
Stefan Behnel <stefan_ml <at> behnel.de> writes:
Note that this would not give you a namespace-free tag name on the element, so you'd still have to use qualified names in a couple of places. It's really best to assign the qualified names to variables and to work with those.
Stefan
Agree, something like ns = "http://www.example.com" etree.iterparse(xml, tag="{%s}a" % ns) will be probably best way. Thanks for answers.
On Tue, 2009-03-17 at 14:59 +0000, Piotr Furman wrote:
<jholg <at> gmx.de> writes:
Is there a usecase for "give me every element x from whatever namespace"?
Holger
Thanks for answer, my use case is that I have a xml file with only one namespace defined in root. I guess that if there were more namespaces in one file it wouldn't make sense, but as long as it's only one I just don't care about that and would have all specified elements.
So I have at least two choices, either remove xmlns from files, or iterate over all elements and filter out those I don't need.
PF
If your concern is that the namespaces are unwieldy, you can also declare them so that you can use a more readable prefix. For example (taken from live code): ns = { 'mets': 'http://www.loc.gov/METS/', 'mods': 'http://www.loc.gov/mods/v3', } timestamp_set = excerpt_filestruct.xpath('mets:fptr[@FILEID="DIGITAL_ACCESS_COPY"]/mets:area', namespaces=ns) Cheers, Cliff
_______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
Hi, Use local-name() in an xpath:
doc = etree.XML('''<x><a xmlns="http://foo"/><a xmlns="http://bar"/></x>''') doc.xpath("//*[local-name() = 'a']") [<Element {http://foo}a at 1710270>, <Element {http://bar}a at 1710210>]
HTH, Laurence 2009/3/17 Piotr Furman <piotr.furman@webservice.pl>:
Hi,
I've got small problem with iterparse and namespaces. I have this code:
from StringIO import StringIO from lxml import etree print etree.__version__ 2.1.5 xml = """<root xmlns="http://www.example.com"><a>1</a><a>2</a></root>""" a1 = etree.iterparse(StringIO(xml), tag="a") a1.next() Traceback (most recent call last): File "<stdin>", line 1, in ? File "iterparse.pxi", line 515, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:77014) StopIteration a2 = etree.iterparse(StringIO(xml), tag="{http://www.example.com}a") a2.next() (u'end', <Element {http://www.example.com}a at -488456c4>)
Is that possible to get all tags "a" without passing namespace? I mean, a2 works, but could it be possible to make a1 working too?
regards Piotr Furman
_______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
participants (6)
-
J. Cliff Dyer
-
jholg@gmx.de
-
Laurence Rowe
-
Piotr Furman
-
Ross Vandegrift
-
Stefan Behnel