Python 2.2 re bug?

Travis Shirk travis at puddy.lan.kerrgulch.net
Sun Aug 25 22:31:51 CEST 2002


> On Sat, 24 Aug 2002 23:46:10 +0200, Travis Shirk wrote:
>> 
>> Using Python 1.5.2:
>> import re;
>> data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5" data1
>> = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data); print data1
>> '\377\340\323\323\344\225\377\000\000\021\377\365'
>> 
>> 
>> This output is exactly what I expect, but now see what happens in 2.2.1:
>> import re;
>> data = "\xFF\x00\xE0\xD3\xD3\xE4\x95\xFF\x00\x00\x11\xFF\x00\xF5" data1
>> = re.compile(r"\xFF\x00([\xE0-\xFF])").sub(r"\xFF\1", data); print data1
>> '\\xFF\xe0\xd3\xd3\xe4\x95\xff\x00\x00\x11\\xFF\xf5'
>> 
>> 

Pedro Rodriguez <pedro_rodriguez at club-internet.fr> wrote:
> I had some issue about this topic and I wonder if your problem does not
> come like me from the raw string stuff. Here goes my reasoning FWIW.

> When you write something like : 
>     r"\x00"
> this actual means : 
>     ['\\', 'x', '0', '0'] (use list(r"\x00"))
> but 
>     "\x00" 
> means 
>     ['\x00'] (using list("\x00"))

> By using raw string you prevent the python parser from replacing the
> proper character in the string. And the 're' module isn't supposed to do
> this kind of substitution, it has its own things to do with '\'.

> So you should probably fix your expression by - carefully - replacing :
>     data1 = re.compile(r"...").sub(r"...")
> with
>     data1 = re.compile("...").sub("...")
> in both 1.5.2 and 2.x version.

Okay to reclarify, 1.5.2 works for me as expected. 
I need r"" in the compile and sub arguments because
both are regular expressions.  If I make both a regular string I don't
get duplicated \\ characters, but the \1 in the sub argument does not
refer to group one of the compiled regex.  Not that I would expect it
to.

The bottom line is that the behavior between 1.5.2 and 2.2.1 is
differerent, and unless there is a workaround 2.2.1 seems broken.

Travis


-- 
-- 
Travis Shirk <travis at pobox dot com>



More information about the Python-list mailing list