[Tutor] Amazing power of Regular Expressions...

Michael Sparks ms at cerenity.org
Sun Nov 5 22:29:17 CET 2006


On Sunday 05 November 2006 15:02, Kent Johnson wrote:
...
> Regular expressions are an extremely powerful and useful tool that every
> programmer should master and then put away and not use when there is an
> alternative :-)

<eyebrow>

There's always an alternative to a regular expression, so are you really 
suggesting *never* use a regex? (seriously though, I doubt you are, but taken 
in this context, that's how it looks).

The most pathological example of regex avoidance I've seen in a while
is this:

def isPlain(text):
    plaindict = {'-': True, '.': True, '1': True, '0': True, '3': True, 
      '2': True, '5': True, '4': True, '7': True, '6': True, '9': True,
      '8': True, 'A': True, 'C': True, 'B': True, 'E': True, 'D': True,
      'G': True, 'F': True, 'I': True, 'H': True, 'K': True, 'J': True,
      'M': True, 'L': True, 'O': True, 'N': True, 'Q': True, 'P': True,
      'S': True, 'R': True, 'U': True, 'T': True, 'W': True, 'V': True,
      'Y': True, 'X': True, 'Z': True, '_': True, 'a': True, 'c': True,
      'b': True, 'e': True, 'd': True, 'g': True, 'f': True, 'i': True,
      'h': True, 'k': True, 'j': True, 'm': True, 'l': True, 'o': True,
      'n': True, 'q': True, 'p': True, 's': True, 'r': True, 'u': True,
      't': True, 'w': True, 'v': True, 'y': True, 'x': True, 'z': True}

    for c in text:
        if plaindict.get(c, False) == False:
            return False
    return True

(sadly this is from real code - in defence of the person
 who wrote it, they weren't even *aware* of regexes)

That's equivalent to the regular expression:
    * ^[0-9A-Za-z_.-]*$

Now, which is clearer? If you learn to read & write regular expressions, then 
the short regular expression is the clearest form. It's also quicker.

I'm not someone who advocates coding-by-regex, as happens rather heavily in 
perl (I like perl about as much as python), but to say "don't use them if 
there's an alternative" is a little strong. Aside from the argument that "you 
now have two problems" (which always applies if you think all problems can be 
hit with the same hammer), solving *everything* with regex is often slower. 
(since people then do one after another, after another - the most 
pathological example I've seen applied over 1000 regexes to a piece
of text, one after another, and then the author wondered why their
code was slow...)

JWZ's quote is more aimed at people who think about solving every problem with 
regexes (and where you end up with 10 line monstrosities in perl with 5 
levels of backtracking).

Also, it's worth bearing in mind that there's more than one definition of what 
regex's are (awk, perl, python, and various C libraries all have slightly 
differing rules and syntax, even if they often share a common base). Rather 
than say there's one true way, it's worth bearing in mind that regexes are 
little more than a shorthand for structured parsing, and bearing this in 
mind, then it's worth recasting JWZ's point as:

If your reaction to seeing a problem is "this looks like it can be solved 
using a regex", you should think to yourself: has someone else already hit 
this problem and have they come up with a specialised pattern matcher for it 
already? If not, why not? 

In this case that *should* have led the poster to the discovery of the
specialised parser:
     time.strptime(date, '%d/%m/%Y')

File globs are another good example of a specialised form of pattern matcher.

Using a regex when it's appropriate is good. Finding a more appropriate 
specialised pattern matcher? Even better. Avoiding using regexes in the way 
I've shown above, because it's an alternative to using a regex? Bad, it's 
slow and unclear.

:-)


Michael.


More information about the Tutor mailing list