regex (?!..) problem

Carl Banks pavlovevidence at gmail.com
Mon Oct 5 09:20:18 CEST 2009


On Oct 4, 11:17 pm, Wolfgang Rohdewald <wolfg... at rohdewald.de> wrote:
> On Monday 05 October 2009, Carl Banks wrote:
>
> > What you're not realizing is that if a regexp search comes to a
> >  dead end, it won't simply return "no match".  Instead it'll throw
> >  away part of the match, and backtrack to a previously-matched
> >  variable-length subexpression, such as ".*?", and try again with a
> >  different length.
>
> well, that explains it. This is contrary to what the documentation
> says, though. Should I fill a bug report?http://docs.python.org/library/re.html

If you're referring to the section where it explains greedy
qualifiers, it is not wrong per se.  re.match does exactly what the
documentation says: it matches as few characters as possible to the
non-greedy pattern.

However, since it's easy to misconstrue that if you don't know about
regexp backtracking, perhaps a little mention of backtracking is is
warranted.  IMO it's not a documentation bug, so if you want to file a
bug report I'd recommend filing as a wishlist item.

I will mention that my followup contained an error (which you didn't
quote).  I said greedy versus non-greedy doesn't affect the substring
matched.  That was wrong, it does affect the substring matched; what
it doesn't affect is whether there is a match found.


> Now back to my original problem: Would you have any idea how
> to solve it?
>
> count() is no solution in my case, I need re.search to either
> return None or a match.

Why do you have to use a regexp at all?

In Python we recommend using string operations and methods whenever
reasonable, and avoiding regexps unless you specifically need their
extra power.  String operations can easily do the examples you posted,
so I see no reason to use regexps.

Depending on what you want to do with the result, one of the following
functions should be close to what you need.  (I am using "word" to
refer to the string being matched against, "token" to be the thing you
don't want to appear more than once.)


def token_appears_once(word,token):
    return word.count(token) == 1

def parts(word,token):
    head,sep,tail = word.partition("C1")
    if sep == "" or "C1" in tail:
        return None
    return (head,sep,tail)


If you really need a match object, you should do a search, and then
call the .count method on the matched substring to see if there is
more than one occurrence, like this:

def match_only_if_token_appears_once(pattern,wotd,token):
    m = re.search(pattern,word)
    if m.group(0).count("C1") != 1:
        m = None
    return m


Carl Banks



More information about the Python-list mailing list