[Tim]
I vote for backward compatibility for now, and not only because that will irritate /F the most.
[/F]
backward compatibility with what?
1.5.2.
8-bit string literals
At least, because they were in 1.5.2.
or unicode string literals?
I'm sorry \x escapes are even allowed in those -- \x notation is a gimmick for making strings hold arbitrary binary data, which we're trying to get away from. To the extent that they make any sense at all in Unicode strings, \u should be used instead.
the problem here is that the pattern is compiled once (from either 8-bit or unicode strings), and can then be used on either 8-bit or unicode targets. to be fully backwards compatible, this means that the compiler should use 8 bits, no matter what string type you're using.
Unicode strings weren't in 1.5.2, so there can't possibly be a backwards compatibility issue with them -- at least not in the sense I'm using the phrase here.
another solution would be to use the type of the pattern string to choose between 8 and 16 bits. I almost implemented that, before I realized that it broke the following rather nice property:
sre.compile("some pattern") == sre.compile(u"some pattern")
(well, the pattern type doesn't implement __cmp__, but you get the idea). the current implementation guarantees "==", but I'm planning to change that to "is" (!).
Do you mean that, e.g., sre.compile("\u0045") == sre.compile(u"\u0045") too? If so, that doesn't make any sense to me (interpreting \u in 8-bit strings is even more confused than interpreting \x in Unicode strings). But if you didn't mean to include this case, then the equality doesn't actually hold now, so there's nothing to preserve <wink>.
anyway, I suspect it's too late to change this in 2.0b1. if enough people complain about this, we can always label it a "critical bug", and do something about it in b2.
I think the real problem here was MAL's generalization of \x to 2-byte stuff in Unicode strings. If Unicode strings *have* to support \x, then \x0123456789abcdef in Unicode strings should act like \u00ef in Unicode strings, and SRE should play along with that too. \x was broken to begin with; better to wipe it out than try to generalize it. OTOH, I didn't get much sleep last night <0.8 wink>.