[BangPypers] parsing xml

Anand Chitipothu anandology at gmail.com
Fri Jul 29 07:45:16 CEST 2011


2011/7/29 Venkatraman S <venkat83 at gmail.com>:
> On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu <anandology at gmail.com>wrote:
>
>> 2011/7/28 Venkatraman S <venkat83 at gmail.com>:
>> > parsing using minidom is one of the slowest. if you just want to extract
>> the
>> > distance and assuming that it(the tag) will always be consistent, then i
>> > would always suggest regexp. xml parsing is a pain.
>>
>> regexp is a bad solution to parse xml.
>>
>> minidom is the fastest solution if you consider the programmer time
>> instead of developer time.  Minidom is available in standard library,
>> you don't have to add another dependency and worry about PyPI
>> downtimes and lxml compilations failures.
>>
>> I don't think there will be significant performance difference between
>> regexp and minidom unless you are doing it a million times.
>>
>>
> Well, i have clearly mentioned my assumptions - i.e, when you treat the XML
> as a 'string' and do not want
> to retrieve anything else in a 'structured manner'. I am a speed-maniac and
> crave for speed; so if the assumption is valid,
> i can vouch for the fact that regexp would be faster and neater solution. I
> have done some speed experiments
> in past on this (results of which i do not have handy), and i found this.
>
> XP asks you implement the best solution with the least effort and i think in
> this case regexp is a winner. Thoughts can vary though.

regexp can at the best be a dirty-hack, not a best solution for xml parsing.

Anand


More information about the BangPypers mailing list