[BangPypers] (no subject)

Tue Sep 24 14:41:30 CEST 2013

On Tue, Sep 24, 2013 at 6:04 PM, Dhananjay Nene
<dhananjay.nene at gmail.com> wrote:
> On Tue, Sep 24, 2013 at 5:48 PM, Vineet Naik <naikvin at gmail.com> wrote:
>> Hi,
>>
>> On Tue, Sep 24, 2013 at 10:38 AM, bab mis <babmis at outlook.com> wrote:
>>
>>> Hi ,Any XML parser which gives the same kind of data structure as yaml
>>> parser gives in python.  Tried with xmlmindom but ir's not of a proper
>>> datastrucure ,every time i need to read by element and create the dict.
>>>
>>
>> You can try xmltodict[1]. It also retains the node attributes and makes
>> than accessible using the '@' prefix (See the example in README of the repo)
>>
>> [1]: https://github.com/martinblech/xmltodict
>
> Being curious I immediately took a look and tried the following :
>
> import xmltodict
>
> doc1 = xmltodict.parse("""
> <mydocument has="an attribute">
>   <and>
>     <many>elements</many>
>     <many>more elements</many>
>   </and>
>   <plus a="complex">
>     element as well
>   </plus>
> </mydocument>
> """)
>
> doc2 = xmltodict.parse("""
> <mydocument has="an attribute">
>   <and>
>     <many>more elements</many>
>   </and>
>   <plus a="complex">
>     element as well
>   </plus>
> </mydocument>
> """)
> print(doc1['mydocument']['and'])
> print(doc2['mydocument']['and'])
>
> The output was :
> OrderedDict([(u'many', [u'elements', u'more elements'])])
> OrderedDict([(u'many', u'more elements')])
>
> The only difference is there is only one "many" node inside the "and"
> node in doc2. Do you see an issue here (at least I do). The output
> structure is a function of the cardinality of the inner nodes. Since
> it changes shape from a list of many to not a list of 1 but just 1
> element (throwing away the list). Which can make things rather
> unpredictable. Since you cannot predict upfront whether the existence
> of just one node inside a parent node is consistent with the xml
> schema or is just applicable in that particular instance.
>
> I do think the problem is tractable so long as one clearly documents
> the specific constraints which the underlying XML will satisfy,
> constraints which will allow transformations to lists or dicts safe.
> Trying to make it easy without clearly documenting the constraints
> could lead to violations of the principle of least surprise like
> above.
>
It gets even more interesting, eg. below

doc3 = xmltodict.parse("""
<mydocument has="an attribute">
  <and>
    <many>elements</many>
  </and>
  <plus a="complex">
    element as well
  </plus>
  <and>
    <many>more elements</many>
  </and>
</mydocument>
""")

print(doc3['mydocument']['and'])

leads to the output :

[OrderedDict([(u'many', u'elements')]), OrderedDict([(u'many', u'more
elements')])]

Definitely not what would be naively expected.