[Python-ideas] Experiment: Adding "re" to string objects.
Steven D'Aprano
steve at pearwood.info
Tue Jul 21 01:52:58 CEST 2009
On Tue, 21 Jul 2009 07:47:05 am Sean Reifschneider wrote:
> if s.re.match(r'whatever(.*)'):
> s.re.group(1)
To me, the above is far less attractive than the standard idiom. That's
an aesthetic judgement which you might disagree with, but I believe
there's a far more critical flaw in the above idiom: it operates by
side-effect in an unsafe way.
Consider:
s = "yabba dabba doo"
if s.re.match(r'y.bba'):
function(s)
print s.re.group(0)
You might expect the above to print 'yabba', but consider:
def function(s):
if s.re.match(r'.*(dabba)'):
log(s)
And now the snippet above will mysteriously print "yabba dabba" instead.
It's (presumably) easy enough to work around this:
s = "yabba dabba doo"
if s.re.match(r'y.bba'):
# Save a copy of the re.group in case it gets mutated
group = s.re.group(0)
print function(s)
print group
but that:
(1) spoils what little convenience your proposal did have;
(2) will lead to confusion when people discover that "Python's regexes
are broken"; and
(3) probably means that you simply cannot use s.re.* in threads at all.
> I'm having a hard time seeing this as a mutable attribute. Now, this
> would be unprecedented in that the value returned by the re.group()
> type calls would vary depending on the re.match() type calls, nothing
> else has similar sorts of side-effects on string objects. But it
> just doesn't "feel mutable" to me because of this.
s.re.group changes it's value according to whether or not you have
called s.re.match() or s.re.search(). Every time you call s.re.match()
with a different argument, s.re.group() potentially changes its value.
Why is this not mutable?
> Christian Heimes wrote:
> >* regular expressions are rarely used in Python. I have just a
> > couple of scripts that use re
>
> Oh really?
I won't speak for Christian, but in my opinion, regexes are overused in
Python, by the sort of people who prefer to write:
import re
if re.match(r'.*\.py$', s):
...
instead of:
if s.endswith('.py'):
...
Hang around comp.lang.python for long enough, and you too will get a
very jaundiced view of regexes being misused, usually by people who
have come from a Perl background.
> Steven D'Aprano wrote:
> >Apologies for the Metoo, but I'm with Nick and Christian on this. It
> >sounds like a terrible idea to me, just to avoid a temporary name in
> >the standard idiom:
> >
> >m = re.match(r'whatever(.*)', s)
> >if m:
> > m.group(1)
>
> It's not so much about adding a temporary name, it's about the above
> being an ugly construct. Particularly in more complex cases:
>
> m = re.match(r'whatever(.*)', s)
> if m:
> m.group(1)
> m = re.match(r'something else(.*)', s)
> if m:
> m.group(1)
>
> instead of:
>
> if s.re.match(r'whatever(.*)') or s.re.match(r'something
> else(.*)'): s.re.group(1)
Time for a convenience function:
# Untested
def multimatch(s, patterns):
"""Do a re.match on s using each pattern in patterns,
returning the first one to succeed, or None if they all fail."""
for pattern in patterns:
m = re.match(pattern, s)
if m: return m
Perhaps that, and the obvious multisearch() function, should be added to
the re module.
--
Steven D'Aprano
More information about the Python-ideas
mailing list