[Tutor] Amazing power of Regular Expressions...
Kent Johnson
kent37 at tds.net
Sun Nov 5 22:46:27 CET 2006
Michael Sparks wrote:
> On Sunday 05 November 2006 15:02, Kent Johnson wrote:
> ...
>> Regular expressions are an extremely powerful and useful tool that every
>> programmer should master and then put away and not use when there is an
>> alternative :-)
>
> <eyebrow>
>
> There's always an alternative to a regular expression, so are you really
> suggesting *never* use a regex? (seriously though, I doubt you are, but taken
> in this context, that's how it looks).
OK, maybe a bit overstated. I use regexes regularly ;) and as I wrote I
think every programmer should know how to use them.
Kent
>
> The most pathological example of regex avoidance I've seen in a while
> is this:
>
> def isPlain(text):
> plaindict = {'-': True, '.': True, '1': True, '0': True, '3': True,
> '2': True, '5': True, '4': True, '7': True, '6': True, '9': True,
> '8': True, 'A': True, 'C': True, 'B': True, 'E': True, 'D': True,
> 'G': True, 'F': True, 'I': True, 'H': True, 'K': True, 'J': True,
> 'M': True, 'L': True, 'O': True, 'N': True, 'Q': True, 'P': True,
> 'S': True, 'R': True, 'U': True, 'T': True, 'W': True, 'V': True,
> 'Y': True, 'X': True, 'Z': True, '_': True, 'a': True, 'c': True,
> 'b': True, 'e': True, 'd': True, 'g': True, 'f': True, 'i': True,
> 'h': True, 'k': True, 'j': True, 'm': True, 'l': True, 'o': True,
> 'n': True, 'q': True, 'p': True, 's': True, 'r': True, 'u': True,
> 't': True, 'w': True, 'v': True, 'y': True, 'x': True, 'z': True}
>
> for c in text:
> if plaindict.get(c, False) == False:
> return False
> return True
>
> (sadly this is from real code - in defence of the person
> who wrote it, they weren't even *aware* of regexes)
>
> That's equivalent to the regular expression:
> * ^[0-9A-Za-z_.-]*$
>
> Now, which is clearer? If you learn to read & write regular expressions, then
> the short regular expression is the clearest form. It's also quicker.
>
> I'm not someone who advocates coding-by-regex, as happens rather heavily in
> perl (I like perl about as much as python), but to say "don't use them if
> there's an alternative" is a little strong. Aside from the argument that "you
> now have two problems" (which always applies if you think all problems can be
> hit with the same hammer), solving *everything* with regex is often slower.
> (since people then do one after another, after another - the most
> pathological example I've seen applied over 1000 regexes to a piece
> of text, one after another, and then the author wondered why their
> code was slow...)
>
> JWZ's quote is more aimed at people who think about solving every problem with
> regexes (and where you end up with 10 line monstrosities in perl with 5
> levels of backtracking).
>
> Also, it's worth bearing in mind that there's more than one definition of what
> regex's are (awk, perl, python, and various C libraries all have slightly
> differing rules and syntax, even if they often share a common base). Rather
> than say there's one true way, it's worth bearing in mind that regexes are
> little more than a shorthand for structured parsing, and bearing this in
> mind, then it's worth recasting JWZ's point as:
>
> If your reaction to seeing a problem is "this looks like it can be solved
> using a regex", you should think to yourself: has someone else already hit
> this problem and have they come up with a specialised pattern matcher for it
> already? If not, why not?
>
> In this case that *should* have led the poster to the discovery of the
> specialised parser:
> time.strptime(date, '%d/%m/%Y')
>
> File globs are another good example of a specialised form of pattern matcher.
>
> Using a regex when it's appropriate is good. Finding a more appropriate
> specialised pattern matcher? Even better. Avoiding using regexes in the way
> I've shown above, because it's an alternative to using a regex? Bad, it's
> slow and unclear.
>
> :-)
>
>
> Michael.
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
More information about the Tutor
mailing list