[Python-ideas] Experiment: Adding "re" to string objects.

Steven D'Aprano steve at pearwood.info
Tue Jul 21 01:52:58 CEST 2009


On Tue, 21 Jul 2009 07:47:05 am Sean Reifschneider wrote:

>    if s.re.match(r'whatever(.*)'):
>       s.re.group(1)

To me, the above is far less attractive than the standard idiom. That's 
an aesthetic judgement which you might disagree with, but I believe 
there's a far more critical flaw in the above idiom: it operates by 
side-effect in an unsafe way.

Consider:

s = "yabba dabba doo"
if s.re.match(r'y.bba'):
    function(s)
    print s.re.group(0)

You might expect the above to print 'yabba', but consider:

def function(s):
    if s.re.match(r'.*(dabba)'):
        log(s)

And now the snippet above will mysteriously print "yabba dabba" instead.

It's (presumably) easy enough to work around this:

s = "yabba dabba doo"
if s.re.match(r'y.bba'):
    # Save a copy of the re.group in case it gets mutated
    group = s.re.group(0)
    print function(s)
    print group


but that:

(1) spoils what little convenience your proposal did have; 
(2) will lead to confusion when people discover that "Python's regexes 
are broken"; and
(3) probably means that you simply cannot use s.re.* in threads at all.


> I'm having a hard time seeing this as a mutable attribute.  Now, this
> would be unprecedented in that the value returned by the re.group()
> type calls would vary depending on the re.match() type calls, nothing
> else has similar sorts of side-effects on string objects.  But it
> just doesn't "feel mutable" to me because of this.

s.re.group changes it's value according to whether or not you have 
called s.re.match() or s.re.search(). Every time you call s.re.match() 
with a different argument, s.re.group() potentially changes its value. 
Why is this not mutable?


> Christian Heimes wrote:
> >* regular expressions are rarely used in Python. I have just a
> > couple of scripts that use re
>
> Oh really?

I won't speak for Christian, but in my opinion, regexes are overused in 
Python, by the sort of people who prefer to write:

import re
if re.match(r'.*\.py$', s):
    ...

instead of:

if s.endswith('.py'):
    ...


Hang around comp.lang.python for long enough, and you too will get a 
very jaundiced view of regexes being misused, usually by people who 
have come from a Perl background.


> Steven D'Aprano wrote:
> >Apologies for the Metoo, but I'm with Nick and Christian on this. It
> >sounds like a terrible idea to me, just to avoid a temporary name in
> >the standard idiom:
> >
> >m = re.match(r'whatever(.*)', s)
> >if m:
> >    m.group(1)
>
> It's not so much about adding a temporary name, it's about the above
> being an ugly construct.  Particularly in more complex cases:
>
>    m = re.match(r'whatever(.*)', s)
>    if m:
>        m.group(1)
>    m = re.match(r'something else(.*)', s)
>    if m:
>        m.group(1)
>
> instead of:
>
>    if s.re.match(r'whatever(.*)') or s.re.match(r'something
> else(.*)'): s.re.group(1)

Time for a convenience function:

# Untested
def multimatch(s, patterns):
    """Do a re.match on s using each pattern in patterns, 
    returning the first one to succeed, or None if they all fail."""
    for pattern in patterns:
        m = re.match(pattern, s)
        if m: return m


Perhaps that, and the obvious multisearch() function, should be added to 
the re module.


-- 
Steven D'Aprano



More information about the Python-ideas mailing list