Raw string substitution problem
Alan G Isaac
alan.isaac at gmail.com
Thu Dec 17 11:51:26 EST 2009
On 12/17/2009 11:24 AM, Richard Brodie wrote:
> A raw string is not a distinct type from an ordinary string
> in the same way byte strings and Unicode strings are. It
> is a merely a notation for constants, like writing integers
> in hexadecimal.
>
>>>> (r'\n', u'a', 0x16)
> ('\\n', u'a', 22)
Yes, that was a mistake. But the problem remains::
>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
True
>>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
False
Why are the first two strings being treated as if they are the last one?
That is, why isn't '\\' being processed in the obvious way?
This still seems wrong. Why isn't it?
More simply, consider::
>>> re.sub('abc', '\\', '123abcdefg')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python26\lib\re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Python26\lib\re.py", line 273, in _subx
template = _compile_repl(template, pattern)
File "C:\Python26\lib\re.py", line 260, in _compile_repl
raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)
Why is this the proper handling of what one might think would be an
obvious substitution?
Thanks,
Alan Isaac
More information about the Python-list
mailing list