[BangPypers] parsing xml

Noufal Ibrahim noufal at gmail.com
Fri Jul 29 10:31:19 CEST 2011


Venkatraman S <venkat83 at gmail.com> writes:

> n Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim <noufal at gmail.com> wrote:
>
>> I agree and I try my best to do the same thing. However, I differentiate
>> between micro optimsations like rewriting parts in C and XML and top
>> level optimisations like good design and the right data structures.
>>
>>
> Using regexp is micro optimization?

For parsing XML, yes it is. 

It's not the first thing I'd use and it's something I'd consider only
after I've exhausted everything else and have reason to believe that my
application is not fast enough just because I'm using an XML parser
instead of a regexp.

There are some places where I would use regexps instead of a parser
upfront though. Mostly related to streams of bad XML data (particularly
while screen scraping) but even then, a fault tolerant parser would do
better than regexps. 


[...]

> IMHO, regexps are much more powerful and fault tolerant than XML parsing.
> XMLs are brittle.

If you say so. I don't have much more to say on this. 

There was an interesting "exchange" a while ago between ERic Raymond and
John Graham-Cumming on using regexps. vs. a regular parser while
screenscraping to fetch data out of a forge site.  Here's the link. You
might find it interesting

http://blog.jgc.org/2009/11/parsing-html-in-python-with.html#links

>> If performance is *this* important to you, why don't you code your
>> entire application in assembly hand crafting it for a certain processor,
>> amount of memory and hard disk platter speed? Why use Python at all? The
>> reason is because Python is "fast enough" for most things. You can get
>> better performance moving to lower level routines but it's often not
>> necessary and the costs it entails are usually not worth it. Better a
>> fast enough stable app than a super fast one that occasionally segfaults
>> and loses data.
>>
>
> Not sure how this point is relevant; the amount of performance you
> need is dependant on the nature of application you develop.

Yup. And I put it to you that switching from a regular XML parser to a
regexp based one will not give you a sufficient speed boost to justify
the higher maintenance costs in most cases. 


> For a webapp, XML parsing is very important factor that the developer
> *must* consider while designing.

And your advice is to use regexps to do this? 


[...]


-- 
~noufal
http://nibrahim.net.in

A verbal contract isn't worth the paper it's written on. Include me out. -Samuel Goldwyn


More information about the BangPypers mailing list