How would I write this perl script in python?

David Bolen db3l at fitlinxx.com
Mon Nov 5 19:27:57 EST 2001


Aaron Ginn <aaron.ginn at motorola.com> writes:

> I'm not sure how to isolate the different parts of the regexp in
> Python as I have done in Perl by using parentheses and then referring
> to the parts as $1, $2, $3, etc.  I've used Python's re module many
> times, but mostly only for searching.  It seems I always revert to
> Perl when I have a complicated (or not so complicated) search and
> replace task to perform.

Within the regex itself you identify groups just as in Perl (with
parentheses).  However, match results are not set to predefined
variables (e.g., $2) but rather are available through the resulting
match object returned by the regex comparison.

To match up with the other behavior, you'd have to identify the line
read from the file explicitly rather than depending on Perl to work on
the current line.  Also, you might want to consider compiling your
pattern for performance.

The most literal translation of your script could be:

    import re, fileinput

    for line in fileinput.input():

        match = re.search(r'^(\w+\s+)(\-*\d+\.\d+)(\s+)(\-*\d+\.\d+)$',line)
        if match:
	    val = match.group(2) - 0.775
            print "%s%s%s%s" % (match.group(1),val,
                                match.group(3),match.group(4))
        else:
            print

Note the use of a raw string for the regex pattern to avoid needing to
quote the backslashes.  I don't believe you use anything in the
pattern that behaves differently between Perl and the Python re
module.

Note also that the Perl code can silently run through lines that don't
actually match the pattern fully, and may produce empty values for $n
(which may in turn, make your val be -0.775 if group 2 didn't match).
The above Python code will generate exceptions in such cases, so if
you are depending on Perl's silent behavior in that case you'd either
need to do some more error checking in the Python code, or enclose the
code within the "if match" block in a try/except clause to deal with
any match group errors, even if only to skip past them.

The fileinput module will handle taking input from command line
filenames or stdin if no filenames are given.  It's not necessarily
the highest performing option (until most recent Python releases), but
it's the closest match to Perl's null filehandle (<>).

You could also use re.match rather than a "^" at the start of your
pattern since match only matches at the beginning of the string.  If
you combined that with compiling the pattern for performance, you'd
have:

    import re, fileinput

    re_pat = re.compile(r'(\w+\s+)(\-*\d+\.\d+)(\s+)(\-*\d+\.\d+)$')

    for line in fileinput.input():

        match = re_pat.match(line)
        if match:
	    val = match.group(2) - 0.775
            print "%s%s%s%s" % (match.group(1),val,
                                match.group(3),match.group(4))
        else:
            print


--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list