[Tutor] Amazing power of Regular Expressions...

Kent Johnson kent37 at tds.net
Sun Nov 5 22:46:27 CET 2006


Michael Sparks wrote:
> On Sunday 05 November 2006 15:02, Kent Johnson wrote:
> ...
>> Regular expressions are an extremely powerful and useful tool that every
>> programmer should master and then put away and not use when there is an
>> alternative :-)
> 
> <eyebrow>
> 
> There's always an alternative to a regular expression, so are you really 
> suggesting *never* use a regex? (seriously though, I doubt you are, but taken 
> in this context, that's how it looks).

OK, maybe a bit overstated. I use regexes regularly ;) and as I wrote I 
think every programmer should know how to use them.

Kent

> 
> The most pathological example of regex avoidance I've seen in a while
> is this:
> 
> def isPlain(text):
>     plaindict = {'-': True, '.': True, '1': True, '0': True, '3': True, 
>       '2': True, '5': True, '4': True, '7': True, '6': True, '9': True,
>       '8': True, 'A': True, 'C': True, 'B': True, 'E': True, 'D': True,
>       'G': True, 'F': True, 'I': True, 'H': True, 'K': True, 'J': True,
>       'M': True, 'L': True, 'O': True, 'N': True, 'Q': True, 'P': True,
>       'S': True, 'R': True, 'U': True, 'T': True, 'W': True, 'V': True,
>       'Y': True, 'X': True, 'Z': True, '_': True, 'a': True, 'c': True,
>       'b': True, 'e': True, 'd': True, 'g': True, 'f': True, 'i': True,
>       'h': True, 'k': True, 'j': True, 'm': True, 'l': True, 'o': True,
>       'n': True, 'q': True, 'p': True, 's': True, 'r': True, 'u': True,
>       't': True, 'w': True, 'v': True, 'y': True, 'x': True, 'z': True}
> 
>     for c in text:
>         if plaindict.get(c, False) == False:
>             return False
>     return True
> 
> (sadly this is from real code - in defence of the person
>  who wrote it, they weren't even *aware* of regexes)
> 
> That's equivalent to the regular expression:
>     * ^[0-9A-Za-z_.-]*$
> 
> Now, which is clearer? If you learn to read & write regular expressions, then 
> the short regular expression is the clearest form. It's also quicker.
> 
> I'm not someone who advocates coding-by-regex, as happens rather heavily in 
> perl (I like perl about as much as python), but to say "don't use them if 
> there's an alternative" is a little strong. Aside from the argument that "you 
> now have two problems" (which always applies if you think all problems can be 
> hit with the same hammer), solving *everything* with regex is often slower. 
> (since people then do one after another, after another - the most 
> pathological example I've seen applied over 1000 regexes to a piece
> of text, one after another, and then the author wondered why their
> code was slow...)
> 
> JWZ's quote is more aimed at people who think about solving every problem with 
> regexes (and where you end up with 10 line monstrosities in perl with 5 
> levels of backtracking).
> 
> Also, it's worth bearing in mind that there's more than one definition of what 
> regex's are (awk, perl, python, and various C libraries all have slightly 
> differing rules and syntax, even if they often share a common base). Rather 
> than say there's one true way, it's worth bearing in mind that regexes are 
> little more than a shorthand for structured parsing, and bearing this in 
> mind, then it's worth recasting JWZ's point as:
> 
> If your reaction to seeing a problem is "this looks like it can be solved 
> using a regex", you should think to yourself: has someone else already hit 
> this problem and have they come up with a specialised pattern matcher for it 
> already? If not, why not? 
> 
> In this case that *should* have led the poster to the discovery of the
> specialised parser:
>      time.strptime(date, '%d/%m/%Y')
> 
> File globs are another good example of a specialised form of pattern matcher.
> 
> Using a regex when it's appropriate is good. Finding a more appropriate 
> specialised pattern matcher? Even better. Avoiding using regexes in the way 
> I've shown above, because it's an alternative to using a regex? Bad, it's 
> slow and unclear.
> 
> :-)
> 
> 
> Michael.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 




More information about the Tutor mailing list