[lxml-dev] space in attribute name: xpath expression?
From another point of view, often we would like to define attribute names as
Hi everybody, It seems not possible to define with fromstring() or ET.XML a tree containing attributes with spaces. But it is possible by adding the attribute containing a space afterwards, see the example below. ################### #!/usr/bin/env python # -*- coding: utf-8 -*- import lxml.etree as ET root = ET.XML("<root><foo attri='bar'>data</foo></root>") foo_elem = root.xpath( "//foo" ) foo_elem[0].set( "tu tu", "22" ) print ET.tostring( root ) ################### We obtain: <root><foo attri="bar" tu tu="22">data</foo></root> It seems a bad idea to have spaces in attributes. I have not found a way to make an xpath request work, for example the two following ones yield an error: print root.xpath( "//*[@tu tu=22]" ) print root.xpath( "//*[@tu\ tu=22]" ) they are, i.e. english expressions with spaces. How do you proceed? Put underscores in the attribute names, and then remove them when displaying in the tree (for example in a graphical widget)? Or define the correspondance between the attribute names and the english names in some part of the XML file (for example, the attribute names could be tags, associated to some text that would contain the english names. Thanks -- python -c "print ''.join([chr(154 - ord(c)) for c in '*9(9&(18%.\ 9&1+,\'Z4(55l4('])" "When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong." (first law of AC Clarke)
TP wrote:
It seems not possible to define with fromstring() or ET.XML a tree containing attributes with spaces.
I do hope it isn't.
But it is possible by adding the attribute containing a space afterwards, see the example below.
################### #!/usr/bin/env python # -*- coding: utf-8 -*-
import lxml.etree as ET
root = ET.XML("<root><foo attri='bar'>data</foo></root>") foo_elem = root.xpath( "//foo" ) foo_elem[0].set( "tu tu", "22" ) print ET.tostring( root ) ###################
We obtain: <root><foo attri="bar" tu tu="22">data</foo></root>
Hmmm, ok, that looks like a bug to me. lxml should validate attribute names on the way in, just like tag names are validated.
From another point of view, often we would like to define attribute names as they are, i.e. english expressions with spaces.
How do you know that they will only ever be "english" expressions? What about Farsi and Chinese?
How do you proceed? Put underscores in the attribute names, and then remove them when displaying in the tree (for example in a graphical widget)?
It is a very good and common design choice to separate data from representation. So these two are completely orthogonal. You can use '_' or '-' to separate words, or you can use a prefixed MD5 hash for the attribute name that maps to a separate name lookup table. Choices are endless.
Or define the correspondance between the attribute names and the english names in some part of the XML file (for example, the attribute names could be tags, associated to some text that would contain the english names.
With "tags" you mean "references", I assume. Maybe even references into a separate XML file (one per language) that defines the presentational name. Without knowing enough about your application, this sounds like a reasonable thing to do. Stefan
Hi,
import lxml.etree as ET
root = ET.XML("<root><foo attri='bar'>data</foo></root>") foo_elem = root.xpath( "//foo" ) foo_elem[0].set( "tu tu", "22" ) print ET.tostring( root ) ###################
XML does not allow blanks in attribute names. At least since version 2.0 lxml disallows setting such names through the API:
import lxml.etree as ET
root = ET.XML("<root><foo attri='bar'>data</foo></root>") foo_elem = root.xpath( "//foo" ) foo_elem[0].set( "tu tu", "22" ) Traceback (most recent call last): File "<stdin>", line 1, in ? File "lxml.etree.pyx", line 646, in lxml.etree._Element.set (src/lxml/lxml.etree.c:9638) File "apihelpers.pxi", line 411, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:31508) File "apihelpers.pxi", line 1323, in lxml.etree._attributeValidOrRaise (src/lxml/lxml.etree.c:38843) ValueError: Invalid attribute name u'tu tu'
print ET.__version__ 2.1.5
From another point of view, often we would like to define attribute names as they are, i.e. english expressions with spaces. How do you proceed? Put underscores in the attribute names, and then remove them when displaying in the tree (for example in a graphical widget)? Or define the correspondance between the attribute names and the english names in some part of the XML file (for example, the attribute names could be tags, associated to some text that would contain the english names.
Yes, why not use a valid separator like _ or . and split words accordingly for representation. Of course, you'd have to make sure that your separator does not normally show up in your expressions. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
jholg@gmx.de wrote:
foo_elem[0].set( "tu tu", "22" ) Traceback (most recent call last): File "<stdin>", line 1, in ? File "lxml.etree.pyx", line 646, in lxml.etree._Element.set (src/lxml/lxml.etree.c:9638) File "apihelpers.pxi", line 411, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:31508) File "apihelpers.pxi", line 1323, in lxml.etree._attributeValidOrRaise (src/lxml/lxml.etree.c:38843) ValueError: Invalid attribute name u'tu tu'
print ET.__version__ 2.1.5
On my computer:
print ET.__version__ 1.3.6
(I use Kubuntu 8.04) So the bug seems to have disappeared in the newer versions.
Yes, why not use a valid separator like _ or . and split words accordingly for representation. Of course, you'd have to make sure that your separator does not normally show up in your expressions.
Thanks for your opinion on the subject. Julien -- python -c "print ''.join([chr(154 - ord(c)) for c in '*9(9&(18%.\ 9&1+,\'Z4(55l4('])" "When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong." (first law of AC Clarke)
participants (3)
-
jholg@gmx.de
-
Stefan Behnel
-
TP