The Regex Story

Patrick Maupin pmaupin at gmail.com
Fri Apr 9 00:16:19 EDT 2010


On Apr 8, 9:32 pm, Dotan Cohen <dotanco... at gmail.com> wrote:
> > Regexes do have their uses. It's a case of knowing when they are the
> > best approach and when they aren't.
>
> Agreed. The problems begin when the "when they aren't" is not recognised.

Arguing against this is like arguing against motherhood and apple
pie.  The same argument can validly be made for any Python construct,
any C construct, etc.  This argument is so broad and vague that it's
completely meaningless.  Platitudes don't help people learn how to
code.  Even constant measuring of speed doesn't really help people
start learning how to code -- it just shows them that there are a lot
of OCD people in this profession.

The great thing about Python is that a lot of people, with differing
ambitions, capabilities, amounts of time to invest, and backgrounds
can pick it up and just start using it.

If somebody asks "how do I use re for this" then IMO the *best*
possible response is to tell them how to use re for this (unless
"this" is *difficult* or *impossible* to do with re, in which case you
shouldn't answer the question unless you've had your coffee and you're
in a good mood).  You might also gently explain that there other
techniques that might, in some cases be easier to code or read.  But
performance?  It's all fine and dandy for the experienced coders to
discuss the finer points of different techniques (which, BTW, are
usually all predicated on using the current CPython implementation,
and might in some cases be completely wrong with one of the new JITs
under development), but you have to trust people to know their own
needs!  If somebody says "this is too slow -- how do I speed it up?"
then that's really the time to strut your stuff and show that you know
how to milk the language for all it's worth.  Until then, just tell
them what they want to know, perhaps with a small disclaimer that it's
probably not the most efficient or elegant or whatever way to solve
their problem.  The process of learning a computer language is one of
breaking through a series of brick walls, and in many cases people
will learn things faster if you help give them the tools to get past
their mental roadblocks.

The thing that Lie and I were reacting to was the visceral "don't do
that" that seems to crop up whenever somebody asks how to do something
with re.  There are a lot of good use cases for re.  Arguably,
something like mxtexttools or some other low-level text processor
would be better for a few of the cases, but they're not part of the
standard library and re is.

One of the great things about Python is that a lot of useful programs
can be written just using Python and the standard library.  No C, no
third-party binary libraries, etc.  It's not just batteries included
-- it's everything included!

I've written C extensions, both bare, and wrapped with Pyrex, and I've
used third-party extension modules, and while that's OK, it's much
better to have some Python source code in a repository that you can
pull down to any kind of system and just RUN. And look at.  And learn
from.

Many useful programs need to do text processing.  Often, the built-in
string functions are sufficient.  But sometimes they are not.
Discouraging somebody from learning re is doing them a disservice,
because, for the things it is really good at, it is the *only* thing
in the standard library that IS really good.

Yes, you can construct regular expressions and example texts that will
exhibit horrible worst-case performance.  But there are a lot of ways
to shoot yourself in the foot performance-wise in Python (as in any
language), and most of them don't require you to use *any* library
functions, much less the dreaded re module.

Often, when I see people give advice that is (I don't want to say
"knee-jerk" because the advice usually has a good foundation) so let's
say "terse" and "unexplained" or maybe even that it is an
"admonishment", it makes me feel that perhaps the person giving the
advice doesn't really trust Python.

I don't remember where I first read it, or heard it, but one of the
core strengths of Python is how easy it is to throw away code and
replace it with something better.  So, trust Python to help people get
something going, and then (if they need or want to!) to make it
better.

Just my 2 cents worth.

Pat



More information about the Python-list mailing list