regexp strangeness

MRAB google at mrabarnett.plus.com
Thu Apr 9 23:54:40 CEST 2009


Peter Otten wrote:
> Dale Amon wrote:
> 
>> This finds nothing:
>>
>> import re
>> import string
>> card       = "abcdef"
>> DEC029     = re.compile("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]")

The regular expression you're actually providing is:

 >>> print "[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]"
[^&0-9A-Z/ $*,.\-:#@'="[<(+\^!);\\]%_>?]
                                 ^^^

The backslash is escaped (the "\\") and the set ends at the first "]".

>> errs       = DEC029.findall(card.strip("\n\r"))
>> print errs
>>
>> This works correctly:
>>
>> import re
>> import string
>> card       = "abcdef"
>> DEC029     = re.compile("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!)\\;\]%_>?]")

The regular expression you're actually providing is:

 >>> print "[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!)\\;\]%_>?]"
[^&0-9A-Z/ $*,.\-:#@'="[<(+\^!)\;\]%_>?]
                                  ^^    ^

The first "]" is escaped (the "\]") and the set ends at the second "]".

>> errs       = DEC029.findall(card.strip("\n\r"))
>> print errs
>>
>> They differ only in the positioning of the quoted backslash.
>>
>> Just in case it is of interest to anyone.
> 
> You have to escape twice; once for Python and once for the regular
> expression. Or use raw strings, denoted by an r"..." prefix:
> 
>>>> re.findall("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]", "abc")
> []
>>>> re.findall("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\\\\]%_>?]", "abc")
> ['a', 'b', 'c']
>>>> re.findall(r"[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]", "abc")
> ['a', 'b', 'c']
> 




More information about the Python-list mailing list