[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Sun Aug 14 19:46:55 CEST 2011

Ezio Melotti <ezio.melotti at gmail.com> added the comment:

> I'm a bit confused on this.  You no longer fix bugs in Python 2?

We do, but it's unlikely that we will introduce major changes in behavior.
Even if we had to get rid of narrow builds and/or fix len(), we would probably only do it in the next 3.x version (i.e. 3.3), and not in the next bug fix release of 3.2 (i.e. 3.2.2).

> That's why I say that you are of conformance by having encoders and
> decoders of UTF streams tolerate noncharacters.  You are not allowed
> to call something a UTF and do non-UTF things with it, because this
> in violation of conformance requirement C2.

This IMHO should be fixed, but it's another issue.

> If you have not reread its Chapter 3 of late in its entirety, you
> probably want to do so.  There is quite a bit of material there that
> is fundamental to any process that claims to be conformant with
> the Unicode Standard.

I am familiar with the Chapter 3, but admittedly I only read the parts that were relevant to the bugs I was fixing.  I never went through it checking that everything in Python matches the described behavior.
Thanks for pointing out the parts were Python doesn't follow the specs.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________