Negation in regular expressions

George Sakkis george.sakkis at gmail.com
Fri Sep 8 10:29:15 EDT 2006


Paddy wrote:

> George Sakkis wrote:
> > It's always striked me as odd that you can express negation of a single
> > character in regexps, but not any more complex expression. Is there a
> > general way around this shortcoming ? Here's an example to illustrate a
> > use case:
> >
> > >>> import re
> > # split with '@' as delimiter
> > >>> [g.group() for g in re.finditer('[^@]+', 'This @ is a @ test ')]
> > ['This ', ' is a ', ' test ']
> >
> > Is it possible to use finditer to split the string if the delimiter was
> > more than one char long (say 'XYZ') ? [yes, I'm aware of re.split, but
> > that's not the point; this is just an example. Besides re.split returns
> > a list, not an iterator]
> >
> > George
>
> If your wiling to use groups then the following will split
>
> >>> [g.group(1) for g in re.finditer(r'(.+?)(?:@#|$)', 'This @# is a @# test ')]
> ['This ', ' is a ', ' test ']

Nice! This covers the most common case, that is non-consecutive
delimiters in the middle of the string. There are three edge cases:
consecutive delimiters, delimiter(s) in the beginning and delimiter(s)
in the end.

The regexp r'(.*?)(?:@#|$)' would match re.split's behavior if it
wasn't for the last empty string it returns:
>>> s = '@# This @# is a @#@# test '
>>> re.split(r'@#', s)
['', ' This ', ' is a ', '', ' test ']
>>> [g.group(1) for g in re.finditer(r'(.*?)(?:@#|$)', s)]
['', ' This ', ' is a ', '', ' test ', '']

Any ideas ?

George




More information about the Python-list mailing list