[BangPypers] (no subject)

Tue Sep 24 14:49:23 CEST 2013

On Tue, Sep 24, 2013 at 6:11 PM, Dhananjay Nene
<dhananjay.nene at gmail.com> wrote:
> On Tue, Sep 24, 2013 at 6:04 PM, Dhananjay Nene
> <dhananjay.nene at gmail.com> wrote:
>> On Tue, Sep 24, 2013 at 5:48 PM, Vineet Naik <naikvin at gmail.com> wrote:
>>> Hi,
>>>
>>> On Tue, Sep 24, 2013 at 10:38 AM, bab mis <babmis at outlook.com> wrote:
>>>
>>>> Hi ,Any XML parser which gives the same kind of data structure as yaml
>>>> parser gives in python.  Tried with xmlmindom but ir's not of a proper
>>>> datastrucure ,every time i need to read by element and create the dict.
>>>>
>>>
>>> You can try xmltodict[1]. It also retains the node attributes and makes
>>> than accessible using the '@' prefix (See the example in README of the repo)
>>>
>>> [1]: https://github.com/martinblech/xmltodict
>>
>> Being curious I immediately took a look and tried the following :
>>
>> import xmltodict
>>
>> doc1 = xmltodict.parse("""
>> <mydocument has="an attribute">
>>   <and>
>>     <many>elements</many>
>>     <many>more elements</many>
>>   </and>
>>   <plus a="complex">
>>     element as well
>>   </plus>
>> </mydocument>
>> """)
>>
>> doc2 = xmltodict.parse("""
>> <mydocument has="an attribute">
>>   <and>
>>     <many>more elements</many>
>>   </and>
>>   <plus a="complex">
>>     element as well
>>   </plus>
>> </mydocument>
>> """)
>> print(doc1['mydocument']['and'])
>> print(doc2['mydocument']['and'])
>>
>> The output was :
>> OrderedDict([(u'many', [u'elements', u'more elements'])])
>> OrderedDict([(u'many', u'more elements')])
>>
>> The only difference is there is only one "many" node inside the "and"
>> node in doc2. Do you see an issue here (at least I do). The output
>> structure is a function of the cardinality of the inner nodes. Since
>> it changes shape from a list of many to not a list of 1 but just 1
>> element (throwing away the list). Which can make things rather
>> unpredictable. Since you cannot predict upfront whether the existence
>> of just one node inside a parent node is consistent with the xml
>> schema or is just applicable in that particular instance.
>>
>> I do think the problem is tractable so long as one clearly documents
>> the specific constraints which the underlying XML will satisfy,
>> constraints which will allow transformations to lists or dicts safe.
>> Trying to make it easy without clearly documenting the constraints
>> could lead to violations of the principle of least surprise like
>> above.
>>
> It gets even more interesting, eg. below
>
> doc3 = xmltodict.parse("""
> <mydocument has="an attribute">
>   <and>
>     <many>elements</many>
>   </and>
>   <plus a="complex">
>     element as well
>   </plus>
>   <and>
>     <many>more elements</many>
>   </and>
> </mydocument>
> """)
>
> print(doc3['mydocument']['and'])
>
> leads to the output :
>
> [OrderedDict([(u'many', u'elements')]), OrderedDict([(u'many', u'more
> elements')])]
>
> Definitely not what would be naively expected.

Correction:

print(doc3['mydocument'])

prints

OrderedDict([(u'@has', u'an attribute'), (u'and',
[OrderedDict([(u'many', u'elements')]), OrderedDict([(u'many', u'more
elements')])]), (u'plus', OrderedDict([(u'@a', u'complex'), ('#text',
u'element as well')]))])

which just trashed the ordering of an and followed by a plus followed by an and.

Dhananjay

-- 
----------------------------------------------------------------------------------------------------------------------------------
http://blog.dhananjaynene.com twitter: @dnene google plus:
http://gplus.to/dhananjaynene