[lxml-dev] etree._Element.items(): Really "arbitrary order" or rather "in document order"?

Hi, I have designed this Loop element that determines its loop counter variable from the first attribute. Example: <xm:Loop i="0x16..0x21" format="0x%02X" j="i+1"> <something base="{i}" incremented="{j}"/> </xm:Loop> I implemented this loop a year ago, finding the loop counter attribute using the following code: # 'el' is the etree._Element <xm:Loop/> from above loop_counter = None format = None variables = {} for name, value in el.attrib.iteritems(): if name == "format": format = value continue if loop_counter is None: # the first attribute that is not 'format' loop_counter = name variables[name] = value I looked at the lxml implementation and it comes down to the _collectAttributes() function, defined in src/lxml/apihelpers.pxi. At that point I had to give up, I found it to be quicker to write this email than to find my way around libxml2 docs and code. I wanted to determine whether attributes parsed by libxml2 will remain in document order, regardless of what lxml's API documents (lxml.etree._Element.items() -> "arbitrary order"). Because, the code above worked without a hitch so far, always delivering attributes in document order. Is there an lxml or libxml2 version that will *not* keep attributes in document order? Why is it documented as being in "arbitrary order"? Do you want to be API-compatible for future changes that might break the order? Is libxml2 silent about this? What does its parser do? It was just observation that led me to believe document order is maintained, but I'd like to have the "proof" behind that observation. I don't know where to continue. Thanks, Felix Rabe

Hi, Praktikant3 - SAG, 27.10.2009 13:47:
I wanted to determine whether attributes parsed by libxml2 will remain in document order, regardless of what lxml's API documents (lxml.etree._Element.items() -> "arbitrary order").
Yes, they will. The parser in libxml2 actually guarantees that (at least, according to the source comments).
Is there an lxml or libxml2 version that will *not* keep attributes in document order? Why is it documented as being in "arbitrary order"?
Because 1) ElementTree does not guarantee that it's document order and 2) lxml does not guarantee it either and 3) document order usually *is* an arbitrary order, except for canonical XML. Writing code that relies on a specific order of attributes within an element is bound to fail in most cases. Note that the interface is a dict-like mapping object. I do not guarantee that it will always stay that way. For example, it might become a dict subclass one day or the .items() method might return a dict view in Py3, or whatever.
Do you want to be API-compatible for future changes that might break the order?
Sure, works so far.
Is libxml2 silent about this? What does its parser do? It was just observation that led me to believe document order is maintained, but I'd like to have the "proof" behind that observation. I don't know where to continue.
libxml2 stores attributes as tree nodes, more specifically as an ordered linked list of attribute nodes, and the parser puts them into that list one after the other, in document order. That should be enough of a "proof" that it works that way. Stefan

Hi Stefan, Thanks! That helped clarify it a lot. - Felix -----Ursprüngliche Nachricht----- Von: Stefan Behnel [mailto:stefan_ml@behnel.de] Gesendet: Dienstag, 27. Oktober 2009 17:49 An: Praktikant3 - SAG Cc: lxml-dev@codespeak.net Betreff: Re: [lxml-dev] etree._Element.items(): Really "arbitrary order" or rather "in document order"? Hi, Praktikant3 - SAG, 27.10.2009 13:47:
I wanted to determine whether attributes parsed by libxml2 will remain in document order, regardless of what lxml's API documents (lxml.etree._Element.items() -> "arbitrary order").
Yes, they will. The parser in libxml2 actually guarantees that (at least, according to the source comments).
Is there an lxml or libxml2 version that will *not* keep attributes in document order? Why is it documented as being in "arbitrary order"?
Because 1) ElementTree does not guarantee that it's document order and 2) lxml does not guarantee it either and 3) document order usually *is* an arbitrary order, except for canonical XML. Writing code that relies on a specific order of attributes within an element is bound to fail in most cases. Note that the interface is a dict-like mapping object. I do not guarantee that it will always stay that way. For example, it might become a dict subclass one day or the .items() method might return a dict view in Py3, or whatever.
Do you want to be API-compatible for future changes that might break the order?
Sure, works so far.
Is libxml2 silent about this? What does its parser do? It was just observation that led me to believe document order is maintained, but I'd like to have the "proof" behind that observation. I don't know where to continue.
libxml2 stores attributes as tree nodes, more specifically as an ordered linked list of attribute nodes, and the parser puts them into that list one after the other, in document order. That should be enough of a "proof" that it works that way. Stefan
participants (2)
-
Praktikant3 - SAG
-
Stefan Behnel