Python 2.2 re bug?

Travis Shirk travis at puddy.lan.kerrgulch.net
Sat Aug 24 17:46:10 EDT 2002


Hi,

I'm running into what looks to be a bug in the python 2.2 re module.
These examples should demonstrate the problem.

Using Python 1.5.2:
import re;
data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5"
data1 = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data);
print data1
'\377\340\323\323\344\225\377\000\000\021\377\365'


This output is exactly what I expect, but now see what happens in 
2.2.1:
import re;
data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5"
data1 = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data);
print data1
'\\xFF\xe0\xd3\xd3\xe4\x95\xff\x00\x00\x11\\xFF\xf5'


I like the hex output over the octal in 1.5, but the substitution is
clearly wrong.  Notice each spot containing "\\" in the last result.

Is this a known bug?  Have the semantics changed wrt the 2.0 unicode aware
re package?

Travis

-- 
Travis Shirk <travis at pobox dot com>



More information about the Python-list mailing list