[Tutor] Regular expression on python
Albert-Jan Roskam
fomcl at yahoo.com
Wed Apr 15 10:50:04 CEST 2015
--------------------------------------------
On Tue, 4/14/15, Peter Otten <__peter__ at web.de> wrote:
Subject: Re: [Tutor] Regular expression on python
To: tutor at python.org
Date: Tuesday, April 14, 2015, 4:37 PM
Steven D'Aprano wrote:
> On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten
wrote:
>> Steven D'Aprano wrote:
>
>> > I swear that Perl has been a blight on an
entire generation of
>> > programmers. All they know is regular
expressions, so they turn every
>> > data processing problem into a regular
expression. Or at least they
>> > *try* to. As you have learned, regular
expressions are hard to read,
>> > hard to write, and hard to get correct.
>> >
>> > Let's write some Python code instead.
> [...]
>
>> The tempter took posession of me and dictated:
>>
>> >>> pprint.pprint(
>> ... [(k, int(v)) for k, v in
>> ...
re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
>> [('Input Read Pairs', 2127436),
>> ('Both Surviving', 1795091),
>> ('Forward Only Surviving', 17315),
>> ('Reverse Only Surviving', 6413),
>> ('Dropped', 308617)]
>
> Nicely done :-)
>
Yes, nice, but why do you use
re.compile(regex).findall(line)
and not
re.findall(regex, line)
I know what re.compile is for. I often use it outside a loop and then actually use the compiled regex inside a loop, I just haven't see the way you use it before.
> I didn't say that it *couldn't* be done with a regex.
I didn't claim that.
> Only that it is
> harder to read, write, etc. Regexes are good tools, but
they aren't the
> only tool and as a beginner, which would you rather
debug? The extract()
> function I wrote, or
r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ?
I know a rhetorical question when I see one ;)
> Oh, and for the record, your solution is roughly 4-5
times faster than
> the extract() function on my computer.
I wouldn't be bothered by that. See below if you are.
> If I knew the requirements were
> not likely to change (that is, the maintenance burden
was likely to be
> low), I'd be quite happy to use your regex solution in
production code,
> although I would probably want to write it out in
verbose mode just in
> case the requirements did change:
>
>
> r"""(?x) (?# verbose mode)
personally, I prefer to be verbose about being verbose, ie use the re.VERBOSE flag. But perhaps that's just a matter of taste. Are there any use cases when the ?iLmsux operators are clearly a better choice than the equivalent flag? For me, the mental burden of a regex is big enough already without these operators.
> (.+?): (?# capture one or
more character, followed by a colon)
> \s+ (?#
one or more whitespace)
> (\d+) (?#
capture one or more digits)
> (?: (?#
don't capture ... )
> \s+
(?# one or more whitespace)
>
\(.*?\) (?# anything
inside round brackets)
> )?
(?# ... and optional)
> \s* (?#
ignore trailing spaces)
> """
<snip>
More information about the Tutor
mailing list