scanf style parsing

Tim Hammerquist tim at vegeta.ath.cx
Thu Sep 27 06:48:24 EDT 2001


Me parece que Bruce Dawson <comments at cygnus-software.com> dijo:
> For Perl hackers it is easy to figure out
> regexp, but for us old C/C++ types, it's *tough*

It's not usually easy to learn regexps, no matter what your background.
I come from C/C++ roots (Turbo C++ 3.0) and TRS-80 BASIC before that,
and I certainly had no idea what regex's were really for until I looked
at Perl.

I struggled with regex's for months. I even had to take some time away
from Perl and regex's to calm down and not be so intimidated.  Many
Pythonistas I've heard in this ng had a lot of difficulty with regular
expressions, and with *good reason*.

Of course, by the time I finally grasped regular expressions, I would
be looking at my mail and catch myself mentally writing a regex to
parse my gas bill!  This is part of why Perler's are a bit overzealous
with regex's.  Python's syntax tames this pretty quick tho, and that's a
good thing.

Regex's are useful and powerful.  But they're also very easy to abuse.
I've actually seen the following Perl code:

    if ($filename =~ /\.txt$/) { ... }

Which would be roughly equivalent to:

    m = re.search(r'\.txt$', filename)
    if m:
        ...

or, much more preferably:

    if filename[-4:] == '.txt':
        ...

I think another reason for Perlers overusing regex's is Perl's shortage
of convenient string indexing operators.

The equivalent of the last Python code in Perl is:

    if (substr($filename, -4) eq '.txt') { ... }

But don't think regex's are disposable just because Python's string type
is more convenient.  Consider the following:

    # perl
    if ($filename =~ /\.([ps]?html?|cgi|php[\d]?|pl)$/) { ... }
    # python
    re_web_files = re.compile(r'\.([ps]?html?|cgi|php[\d]?|pl)$')
    m = re_web_files.search(filename)
    if m:
        ...

This is a very complicated (but relatively efficient way) to match files
with all the folowing extensions:
    .htm    .html   .shtm   .shtml  .phtm   .phtml
    .cgi
    .php    .php2   .php3   .php4
    .pl

Even with Python's less convenient class implementation of regex's (as
opposed to Perl's operator implementation), not a bad example, and half
of the power of regular expressions hasn't even been displayed here.

If you don't need a regex, don't feel obligated. (You very rarely *need*
a regex, but workarounds can get pretty ugly.)
Use them sparingly and they can save your butt. They did mine. <wink>

-- 
In 1968 it took the computing power of 2 C-64's to fly a rocket to the moon.
Now, in 1998 it takes the Power of a Pentium 200 to run Microsoft Windows 98.
Something must have gone wrong.



More information about the Python-list mailing list