[BangPypers] parsing xml

Gora Mohanty gora at mimirtech.com
Thu Jul 28 21:53:19 CEST 2011


On Thu, Jul 28, 2011 at 10:37 PM, Venkatraman S <venkat83 at gmail.com> wrote:
> parsing using minidom is one of the slowest. if you just want to extract the
> distance and assuming that it(the tag) will always be consistent, then i
> would always suggest regexp. xml parsing is a pain.
[...]

Strongly disagree. IMHO, regexps are the wrong solution
for parsing XML (or, any kind of well-structured text), as
they end up becoming intolerably complex, and do not
degrade gracefully for broken XML.

Have not compared speeds myself, but there are blogs
that go into that. In my experience, the cleanest, most
efficient, and richest-in-features Python XML library is
lxml. For people used to BeautifulSoup, lxml has a
BeautifulSoup parser, and is significantly more efficient.

Regards,
Gora


More information about the BangPypers mailing list