Python vs. Perl, which is better to learn?

Mark McEahern marklists at mceahern.com
Wed May 1 09:58:50 EDT 2002


[James J. Besemer]
> Superior probably is too strong a word, though regex in Perl is a
> little easier to get to right out of the box.  I.e, regex is part
> of Perl's "builtins".

Thank you for your response.  I'm not terribly familiar with Python's re
module, but I'll see if I can duplicate the examples you provided...

> PERL regex example:
>
>         # No import, compile, function or object syntax.
>         # Implied match is with the "current" thingy ( $_, IIRC)

As others have remarked, I prefer the explicitness of Python's re
module--you have to import it to use it.

>
>        action() if /regex/ ;    # perform action() if regex match

Well, literally speaking, as far as I know there's no Python equivalent to
this.  To get a sense of why, you can type "import this" into the Python
interactive interpreter.  ;-)

Nonetheless, you can do the equivalent fairly easily by explicitly
specifying the string you want to match:

import re
s = "Explicit is better than implicit."
pat = re.compile("licit")
if pat.search(s):
	action()

> To match with a specific object, say, a variable you use the "=~"
> operator.
>
>     action( $a) if $a =~ /regex/ ; # perform action if regex matches $a

I think the same Python example above covers this situation.

> Note that the statement for substitute is like the "vi" command:
>
>     $a ~ s/old/new/gi  if $a =~ /pattern1/;
>
>         # substitue "new" for "old" in $a if $a matches pattern1
>         #    g suffix for global replacement
>         #    i suffix for case insensitive comparison

You can use the re.sub() to replace count occurrences of a pattern:

import re
s = "Explicit is better than implicit."
old = re.compile("e", re.IGNORECASE)
new = "f"
count = 1
if pat.search(s):
    s2 = re.sub(old, new, s, count)
    print s2

Not specifying count means replace 'em all.

> The basic regex operators are similar to Python's, though Perl
> adds some extras such as
>
>     {n,m}    # preceeding pattern matches at least n but no more
> than m times

Python has that too:

  http://www.python.org/doc/current/lib/re-syntax.html

> A successful match sets a flurry of global variables:
>
>     $& = the matched portion of the input string
>
>     $` = everything before the match
>
>     $' = everything after the match

Thank Crom Python doesn't do that.  As someone else mentioned, these things
are tucked away into match objects, if you want them, rather than being
squirted into your namespace.

> Parentheses in the regex break the matching pattern into "groups" and the
> portions of the string coresponding to each group may be accessed via:
>
>     $1, $2, ...
>
> E.g.,
>
>     s/^([^ ]* *([^ ]*)/$2 $1/;    # reverse order of 2 words

This switches the first and last words of a sentence.  I didn't bother
putting the period back in there or sentence-casing the new first word.

import re
s = "Explicit is better than implicit."
pat = re.compile("^(\w+)(.* )(\w+)\.$")
m = pat.search(s)
if m:
    print "%s%s%s" % (m.groups()[2], m.groups()[1], m.groups()[0])

>     if( /Time: (..):(..):(..)/ ){    # extract hh:mm:ss fields
>         $hours = $1;
>         $min = $2;
>         $sec = $3;
>     }

Here's a case where Python's ability to name groups is interesting:

import re, time
t = time.asctime()
pat = re.compile("(?P<hour>\d{2})\:(?P<minute>\d{2})\:(?P<second>\d{2})")
m = pat.search(t)
if m:
    print m.group('hour')
    print m.group('minute')
    print m.group('second')
else:
    print "Not found."

Cheers,

// mark






More information about the Python-list mailing list