[Tutor] regex: not start with FOO

Kent Johnson kent37 at tds.net
Tue Feb 3 18:00:50 CET 2009


On Tue, Feb 3, 2009 at 11:12 AM, Bernard Rankin <berankin99 at yahoo.com> wrote:

> In [3]: re.findall('^(?!FOO)in', 'in in in')
> Out[3]: ['in']
>
> In [4]: re.findall('(?!^FOO)in', 'in in in')
> Out[4]: ['in', 'in', 'in']
>
> In [5]: re.findall('(?!FOO)in', 'in in in')
> Out[5]: ['in', 'in', 'in']
>
> In [6]: re.findall('(?!FOO$)in', 'in in in')
> Out[6]: ['in', 'in', 'in']
>
> In [7]: re.findall('(?!^FOO$)in', 'in in in')
> Out[7]: ['in', 'in', 'in']
>
>
> What is the effective difference between numbers 4 thru 7?
>
> That is, what effect does a string position anchor have within the sub expression?

6 & 7 are meaningless; you can never have an end-of-line ($) followed by text.

> Hmm...
>
> In [30]: re.findall('(?!FOO)in', 'in FOOin in')
> Out[30]: ['in', 'in', 'in']

OK. (?!...) is a look *ahead* assertion - it requires that the current
match not be followed by the given expression. It seems that this is
meaningless at the start of a regex, since there is no current match.
In other words, '(?!FOO)in' matches 'in FOOin in' at position 6
because starting at position 6 there is no FOO.

You should use a look-behind assertion, or just put the ^ outside the assertion.

In [2]: re.findall('(?!FOO)in', 'in FOOin in')
Out[2]: ['in', 'in', 'in']

In [3]: re.findall('(?<!FOO)in', 'in FOOin in')
Out[3]: ['in', 'in']

In [4]: re.findall('(?!^FOO)in', 'FOOin FOOin in')
Out[4]: ['in', 'in', 'in']

Kent


More information about the Tutor mailing list