[Tutor] Regular expression on python
Steven D'Aprano
steve at pearwood.info
Tue Apr 14 14:21:25 CEST 2015
On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten wrote:
> Steven D'Aprano wrote:
> > I swear that Perl has been a blight on an entire generation of
> > programmers. All they know is regular expressions, so they turn every
> > data processing problem into a regular expression. Or at least they
> > *try* to. As you have learned, regular expressions are hard to read,
> > hard to write, and hard to get correct.
> >
> > Let's write some Python code instead.
[...]
> The tempter took posession of me and dictated:
>
> >>> pprint.pprint(
> ... [(k, int(v)) for k, v in
> ... re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
> [('Input Read Pairs', 2127436),
> ('Both Surviving', 1795091),
> ('Forward Only Surviving', 17315),
> ('Reverse Only Surviving', 6413),
> ('Dropped', 308617)]
Nicely done :-)
I didn't say that it *couldn't* be done with a regex. Only that it is
harder to read, write, etc. Regexes are good tools, but they aren't the
only tool and as a beginner, which would you rather debug? The extract()
function I wrote, or r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ?
Oh, and for the record, your solution is roughly 4-5 times faster than
the extract() function on my computer. If I knew the requirements were
not likely to change (that is, the maintenance burden was likely to be
low), I'd be quite happy to use your regex solution in production code,
although I would probably want to write it out in verbose mode just in
case the requirements did change:
r"""(?x) (?# verbose mode)
(.+?): (?# capture one or more character, followed by a colon)
\s+ (?# one or more whitespace)
(\d+) (?# capture one or more digits)
(?: (?# don't capture ... )
\s+ (?# one or more whitespace)
\(.*?\) (?# anything inside round brackets)
)? (?# ... and optional)
\s* (?# ignore trailing spaces)
"""
That's a hint to people learning regular expressions: start in verbose
mode, then "de-verbose" it if you must.
--
Steve
More information about the Tutor
mailing list