[Tutor] Regular expression on python

Steven D'Aprano steve at pearwood.info
Tue Apr 14 14:21:25 CEST 2015


On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten wrote:
> Steven D'Aprano wrote:

> > I swear that Perl has been a blight on an entire generation of
> > programmers. All they know is regular expressions, so they turn every
> > data processing problem into a regular expression. Or at least they
> > *try* to. As you have learned, regular expressions are hard to read,
> > hard to write, and hard to get correct.
> > 
> > Let's write some Python code instead.
[...]

> The tempter took posession of me and dictated:
> 
> >>> pprint.pprint(
> ... [(k, int(v)) for k, v in
> ... re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
> [('Input Read Pairs', 2127436),
>  ('Both Surviving', 1795091),
>  ('Forward Only Surviving', 17315),
>  ('Reverse Only Surviving', 6413),
>  ('Dropped', 308617)]

Nicely done :-)

I didn't say that it *couldn't* be done with a regex. Only that it is 
harder to read, write, etc. Regexes are good tools, but they aren't the 
only tool and as a beginner, which would you rather debug? The extract() 
function I wrote, or r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ?

Oh, and for the record, your solution is roughly 4-5 times faster than 
the extract() function on my computer. If I knew the requirements were 
not likely to change (that is, the maintenance burden was likely to be 
low), I'd be quite happy to use your regex solution in production code, 
although I would probably want to write it out in verbose mode just in 
case the requirements did change:


r"""(?x)    (?# verbose mode)
    (.+?):  (?# capture one or more character, followed by a colon)
    \s+     (?# one or more whitespace)
    (\d+)   (?# capture one or more digits)
    (?:     (?# don't capture ... )
      \s+       (?# one or more whitespace)
      \(.*?\)   (?# anything inside round brackets)
      )?        (?# ... and optional)
    \s*     (?# ignore trailing spaces)
    """


That's a hint to people learning regular expressions: start in verbose 
mode, then "de-verbose" it if you must.


-- 
Steve


More information about the Tutor mailing list