Confused by slash/escape in regexp
lie.1296 at gmail.com
Mon Apr 12 02:12:05 CEST 2010
On 04/12/10 08:43, andrew cooke wrote:
> Is the third case here surprising to anyone else? It doesn't make
> sense to me...
> Python 2.6.2 (r262:71600, Oct 24 2009, 03:15:21)
> [GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from re import compile
>>>> p1 = compile('a\x62c')
> <_sre.SRE_Match object at 0x7f4e8f93d578>
>>>> p2 = compile('a\\x62c')
> <_sre.SRE_Match object at 0x7f4e8f93d920>
>>>> p3 = compile('a\\\x62c')
It isn't so much about regex but about string:
Python 2.6.4 (r264:75706, Mar 18 2010, 01:03:14)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print 'a\x62c'
>>> print 'a\\x62c'
>>> print 'a\\\x62c'
In the first case, *python* will unescape the string literal '\x62' into
letters 'b'. In the second case, python will unescape the double
backslash '\\' into a single slash '\' and *regex* will unescape the
single-slash-62 into 'b'. In the third case, *python* will unescape
double backslash '\\' into single-slash '\' and byte-string-62 '\x62' to
letter-b 'b', and regex received it as 'a\bc', which interpreted as a
special character to regex:
\b Matches the empty string, but only at the start or end of a word.
More information about the Python-list