[Tutor] Regex's and "best practice"
Carl D Cravens
raven at phoenyx.net
Fri Nov 21 20:58:45 EST 2003
Hello. I'm a Unix sysadmin and developer with a background in C (a few
years ago) and more recently Perl and shell scripting. I'm rather
proficient with Perl, but I get frustrated with it when I have to deal
with references and in trying to build modules.
I've recently started learning Python... I'm just about finished with the
Tutorial, and I've been converting a 90-line Perl script I'd written a few
years ago to see how Python handles it.
Now, this script doesn't show off Python's strengths... all it does is
read through a standard Unix mailbox and print out From: and Subject: in a
compact format. (Actually, basically the same index format as Pine.) So
it's doing a lot of string manipulation and pattern matching.
Here's my question. The From: line can appear in basically four forms,
and I have a little chain of s///'s that try to find the "real name", and
barring that, use the address, stripping out extra junk. Here's the Perl
snippet... (the "From: " has been stripped and the remainder is in $line)
## three formats to deal with (a bare address falls through)
## <address> , Name <address>, (Name) address
$line ~= s/(^<)(.*)(>$)/$2/ ||
$line =~ s/<.*>// ||
$line ~=s/(.*)(\()(.*)(\))(.*)/$3/;
idpatt = [ re.compile( '(^<)(.*)(>$)' ),
re.compile( '<.*>' ),
re.compile( '(.*)(\()(.*)(\))(.*)' ) ]
idrepl = ['\\2', '', '\\3']
oldline = line
for idpt,idrp in zip( idpatt, idrepl ):
line = idpt.sub( idrp, line )
if oldline != line:
break
Not nearly as compact and simple as the Perl statement.
Is this pretty much the best I can do? The OR's were very convenient in
Perl... in Python, I have to do relatively large amount of work to get the
same effect. (And I don't think it's necessary... I could let Python
evaluate all threee sub()'s, even when further evaluation won't find
anything to match.) That would reduce the last block to...
for idpt,idrp in zip( idpatt, idrepl ):
line = idpt.sub( idrp, line )
...which doesn't look so bad, but it keeps processing the sub()'s even
after a match has been made.
Is there something obvious I'm missing, or is this a fair solution to the
problem?
Thanks!
--
Carl D Cravens (raven at phoenyx.net)
Talk is cheap because supply inevitably exceeds demand.
More information about the Tutor
mailing list