Comparison of parsers in python?

TerryP bigboss1964 at gmail.com
Sun Sep 20 05:39:16 CEST 2009


Peng Yu wrote:
> This is more a less just a list of parsers. I would like some detailed
> guidelines on which one to choose for various parsing problems.
>
> Regards,
> Peng


It depends on the parsing problem.

Obviously your not going to use an INI parser to work with XML, or
vice versa. Likewise some formats can be parsed in different ways, XML
parsers for example are often build around a SAX or DOM model. The
differences between them (hit Wikipedia) can effect the performance of
your application, more then learning how to use an XML parsers API can
effect the hair on your head.

For flat data, simple unix style rc or dos style ini file will often
suffice, and writing a parser is fairly trivial; in fact writing a
config file parser is an excellent learning exercise, to get a feel
for a given languages standard I/O, string handling, and type
conversion features. These kind of parsers tend to be pretty quick
because of their simplicity, and writing a small but extremely fast
one can be enjoyable at times; one of these days I need to do it in
X86 assembly just for the hell of it. Python includes an INI parser in
the standard library.

XML serves well for hierarchical data models, but can be a royal pain
to write code around the parsers (IMHO anyway!), but often is handy.
Popular parsers for XML include expat and libxml2 - there is also a
more "Pythonic" wrapper for libxml/libxslt called py-lxml; Python also
comes with parsers for XML. Other formats such as JSON, YAML, heck
even S-expressions could be used and parsed. Some programs only parse
enough to slup up code and eval it (not always smart, but sometimes
useful).

In general the issues to consider when selecting a parser for a given
format, involve: speed, size, and time. How long does it take to
process the data set, how much memory (size) does it consume, and how
much bloody time will it take to learn the API ;).


The best way to choose a parser, is experiment with several, test (and
profile!) them according to the project, then pick the one you like
best, out of those that are suitable for the task. Profiling can be
very important.


--
TerryP



More information about the Python-list mailing list