[Tutor] Regex's and "best practice"

Carl D Cravens raven at phoenyx.net
Fri Nov 21 20:58:45 EST 2003


Hello.  I'm a Unix sysadmin and developer with a background in C (a few
years ago) and more recently Perl and shell scripting.  I'm rather
proficient with Perl, but I get frustrated with it when I have to deal
with references and in trying to build modules.

I've recently started learning Python... I'm just about finished with the
Tutorial, and I've been converting a 90-line Perl script I'd written a few
years ago to see how Python handles it.

Now, this script doesn't show off Python's strengths...  all it does is
read through a standard Unix mailbox and print out From: and Subject: in a
compact format.  (Actually, basically the same index format as Pine.)  So
it's doing a lot of string manipulation and pattern matching.

Here's my question.  The From: line can appear in basically four forms,
and I have a little chain of s///'s that try to find the "real name", and
barring that, use the address, stripping out extra junk.  Here's the Perl
snippet... (the "From: " has been stripped and the remainder is in $line)

## three formats to deal with (a bare address falls through)
## <address> , Name <address>, (Name) address
$line ~= s/(^<)(.*)(>$)/$2/ ||
    $line =~ s/<.*>// ||
    $line ~=s/(.*)(\()(.*)(\))(.*)/$3/;


idpatt = [ re.compile( '(^<)(.*)(>$)' ),
           re.compile( '<.*>' ),
           re.compile( '(.*)(\()(.*)(\))(.*)' ) ]

idrepl = ['\\2', '', '\\3']

oldline = line
for idpt,idrp in zip( idpatt, idrepl ):
    line = idpt.sub( idrp, line )
    if oldline != line:
        break

Not nearly as compact and simple as the Perl statement.

Is this pretty much the best I can do?  The OR's were very convenient in
Perl... in Python, I have to do relatively large amount of work to get the
same effect.  (And I don't think it's necessary... I could let Python
evaluate all threee sub()'s, even when further evaluation won't find
anything to match.)  That would reduce the last block to...

for idpt,idrp in zip( idpatt, idrepl ):
    line = idpt.sub( idrp, line )

...which doesn't look so bad, but it keeps processing the sub()'s even
after a match has been made.

Is there something obvious I'm missing, or is this a fair solution to the
problem?

Thanks!

--
Carl D Cravens (raven at phoenyx.net)
Talk is cheap because supply inevitably exceeds demand.



More information about the Tutor mailing list