[Tutor] Help on RE
Steven D'Aprano
steve at pearwood.info
Sun Jan 23 04:10:35 CET 2011
tee chwee liong wrote:
> thanks for making me understand more on re. re is a confusing topic as i'm starting on python.
I quote the great Jamie Zawinski, a world-class programmer and hacker:
Some people, when confronted with a problem, think 'I know, I'll
use regular expressions." Now they have two problems.
Zawinski doesn't mean that you should never use regexes. But they should
be used only when necessary, for problems that are difficult enough to
require a dedicated domain-specific language for solving search problems.
Because that's what regexes are: they're a programming language for text
searching. They're not a full-featured programming language like Python
(technically, they are not Turing Complete) but nevertheless they are a
programming language. A programming language with a complicated,
obscure, hideously ugly syntax (and people complain about Forth!). Even
the creator of Perl, Larry Wall, has complained about regex syntax and
gives 19 serious faults with regular expressions:
http://dev.perl.org/perl6/doc/design/apo/A05.html
Most people turn to regexes much too quickly, using them to solve
problems that are either too small to need regexes, or too large. Using
regexes for solving your problem is like using a chainsaw for peeling an
orange.
Your data is very simple, and doesn't need regexes. It looks like this:
Platform: PC
Tempt : 25
TAP0 :0
TAP1 :1
+++++++++++++++++++++++++++++++++++++++++++++
Port Chnl Lane EyVt EyHt
+++++++++++++++++++++++++++++++++++++++++++++
0 1 1 75 55
0 1 2 10 35
0 1 3 25 35
0 1 4 35 25
0 1 5 10 -1
+++++++++++++++++++++++++++++++++++++++++++++
Time: 20s
The part you care about is the table of numbers, each line looks like this:
0 1 5 10 -1
The easiest way to parse this line is this:
numbers = [int(word) for word in line.split()]
All you need then is a way of telling whether you have a line in the
table, or a header. That's easy -- just catch the exception and ignore it.
template = "Port=%d, Channel=%d, Lane=%d, EyVT=%d, EyHT=%d"
for line in lines:
try:
numbers = [int(word) for word in line.split()]
except ValueError:
continue
print(template % tuple(numbers))
Too easy. Adding regexes just makes it slow, fragile, and difficult.
My advice is, any time you think you might need regexes, you probably don't.
--
Steven
More information about the Tutor
mailing list