regexp strangeness

Peter Otten __peter__ at web.de
Thu Apr 9 16:05:01 EDT 2009


Dale Amon wrote:

> This finds nothing:
> 
> import re
> import string
> card       = "abcdef"
> DEC029     = re.compile("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]")
> errs       = DEC029.findall(card.strip("\n\r"))
> print errs
> 
> This works correctly:
> 
> import re
> import string
> card       = "abcdef"
> DEC029     = re.compile("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!)\\;\]%_>?]")
> errs       = DEC029.findall(card.strip("\n\r"))
> print errs
> 
> They differ only in the positioning of the quoted backslash.
> 
> Just in case it is of interest to anyone.

You have to escape twice; once for Python and once for the regular
expression. Or use raw strings, denoted by an r"..." prefix:

>>> re.findall("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]", "abc")
[]
>>> re.findall("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\\\\]%_>?]", "abc")
['a', 'b', 'c']
>>> re.findall(r"[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]", "abc")
['a', 'b', 'c']

Peter



More information about the Python-list mailing list