[BangPypers] (no subject)
Dhananjay Nene
dhananjay.nene at gmail.com
Tue Sep 24 14:41:30 CEST 2013
On Tue, Sep 24, 2013 at 6:04 PM, Dhananjay Nene
<dhananjay.nene at gmail.com> wrote:
> On Tue, Sep 24, 2013 at 5:48 PM, Vineet Naik <naikvin at gmail.com> wrote:
>> Hi,
>>
>> On Tue, Sep 24, 2013 at 10:38 AM, bab mis <babmis at outlook.com> wrote:
>>
>>> Hi ,Any XML parser which gives the same kind of data structure as yaml
>>> parser gives in python. Tried with xmlmindom but ir's not of a proper
>>> datastrucure ,every time i need to read by element and create the dict.
>>>
>>
>> You can try xmltodict[1]. It also retains the node attributes and makes
>> than accessible using the '@' prefix (See the example in README of the repo)
>>
>> [1]: https://github.com/martinblech/xmltodict
>
> Being curious I immediately took a look and tried the following :
>
> import xmltodict
>
> doc1 = xmltodict.parse("""
> <mydocument has="an attribute">
> <and>
> <many>elements</many>
> <many>more elements</many>
> </and>
> <plus a="complex">
> element as well
> </plus>
> </mydocument>
> """)
>
> doc2 = xmltodict.parse("""
> <mydocument has="an attribute">
> <and>
> <many>more elements</many>
> </and>
> <plus a="complex">
> element as well
> </plus>
> </mydocument>
> """)
> print(doc1['mydocument']['and'])
> print(doc2['mydocument']['and'])
>
> The output was :
> OrderedDict([(u'many', [u'elements', u'more elements'])])
> OrderedDict([(u'many', u'more elements')])
>
> The only difference is there is only one "many" node inside the "and"
> node in doc2. Do you see an issue here (at least I do). The output
> structure is a function of the cardinality of the inner nodes. Since
> it changes shape from a list of many to not a list of 1 but just 1
> element (throwing away the list). Which can make things rather
> unpredictable. Since you cannot predict upfront whether the existence
> of just one node inside a parent node is consistent with the xml
> schema or is just applicable in that particular instance.
>
> I do think the problem is tractable so long as one clearly documents
> the specific constraints which the underlying XML will satisfy,
> constraints which will allow transformations to lists or dicts safe.
> Trying to make it easy without clearly documenting the constraints
> could lead to violations of the principle of least surprise like
> above.
>
It gets even more interesting, eg. below
doc3 = xmltodict.parse("""
<mydocument has="an attribute">
<and>
<many>elements</many>
</and>
<plus a="complex">
element as well
</plus>
<and>
<many>more elements</many>
</and>
</mydocument>
""")
print(doc3['mydocument']['and'])
leads to the output :
[OrderedDict([(u'many', u'elements')]), OrderedDict([(u'many', u'more
elements')])]
Definitely not what would be naively expected.
More information about the BangPypers
mailing list