[Tutor] Help on RE

Steven D'Aprano steve at pearwood.info
Sun Jan 23 04:10:35 CET 2011


tee chwee liong wrote:
> thanks for making me understand more on re. re is a confusing topic as i'm starting on python. 

I quote the great Jamie Zawinski, a world-class programmer and hacker:

     Some people, when confronted with a problem, think 'I know, I'll
     use regular expressions." Now they have two problems.


Zawinski doesn't mean that you should never use regexes. But they should 
be used only when necessary, for problems that are difficult enough to 
require a dedicated domain-specific language for solving search problems.

Because that's what regexes are: they're a programming language for text 
searching. They're not a full-featured programming language like Python 
(technically, they are not Turing Complete) but nevertheless they are a 
programming language. A programming language with a complicated, 
obscure, hideously ugly syntax (and people complain about Forth!). Even 
the creator of Perl, Larry Wall, has complained about regex syntax and 
gives 19 serious faults with regular expressions:

http://dev.perl.org/perl6/doc/design/apo/A05.html

Most people turn to regexes much too quickly, using them to solve 
problems that are either too small to need regexes, or too large. Using 
regexes for solving your problem is like using a chainsaw for peeling an 
orange.

Your data is very simple, and doesn't need regexes. It looks like this:


Platform: PC
Tempt : 25
TAP0 :0
TAP1 :1
+++++++++++++++++++++++++++++++++++++++++++++
Port Chnl Lane EyVt EyHt
+++++++++++++++++++++++++++++++++++++++++++++
0  1  1  75  55
0  1  2  10 35
0  1  3  25 35
0  1  4  35 25
0  1  5  10 -1
+++++++++++++++++++++++++++++++++++++++++++++
Time: 20s


The part you care about is the table of numbers, each line looks like this:

0  1  5  10 -1

The easiest way to parse this line is this:

numbers = [int(word) for word in line.split()]

All you need then is a way of telling whether you have a line in the 
table, or a header. That's easy -- just catch the exception and ignore it.

template = "Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d"
for line in lines:
     try:
         numbers = [int(word) for word in line.split()]
     except ValueError:
         continue
     print(template % tuple(numbers))


Too easy. Adding regexes just makes it slow, fragile, and difficult.


My advice is, any time you think you might need regexes, you probably don't.


-- 
Steven



More information about the Tutor mailing list