Python and Regular Expressions

Wed Apr 7 21:25:36 EDT 2010

On Apr 7, 3:52 am, Chris Rebert <c... at rebertia.com> wrote:

> Regular expressions != Parsers

True, but lots of parsers *use* regular expressions in their
tokenizers.  In fact, if you have a pure Python parser, you can often
get huge performance gains by rearranging your code slightly so that
you can use regular expressions in your tokenizer, because that
effectively gives you access to a fast, specialized C library that is
built into practically every Python interpreter on the planet.

> Every time someone tries to parse nested structures using regular
> expressions, Jamie Zawinski kills a puppy.

And yet, if you are parsing stuff in Python, and your parser doesn't
use some specialized C code for tokenization (which will probably be
regular expressions unless you are using mxtexttools or some other
specialized C tokenizer code), your nested structure parser will be
dog slow.

Now, for some applications, the speed just doesn't matter, and for
people who don't yet know the difference between regexps and parsing,
pointing them at PyParsing is certainly doing them a valuable service.

But that's twice today when I've seen people warned off regular
expressions without a cogent explanation that, while the re module is
good at what it does, it really only handles the very lowest level of
a parsing problem.

My 2 cents is that something like PyParsing is absolutely great for
people who want a simple parser without a lot of work.  But if people
use PyParsing, and then find out that (for their particular
application) it isn't fast enough, and then wonder what to do about
it, if all they remember is that somebody told them not to use regular
expressions, they will just come to the false conclusion that pure
Python is too painfully slow for any real world task.

Regards,
Pat