[Python-Dev] len(chr(i)) = 2?
Victor Stinner
victor.stinner at haypocalc.com
Fri Nov 19 21:23:14 CET 2010
Hi,
On Friday 19 November 2010 17:53:58 Alexander Belopolsky wrote:
> I was recently surprised to learn that chr(i) can produce a string of
> length 2 in python 3.x.
Yes, but only on narrow build. Eg. Debian and Ubuntu compile Python 3.1 in
wide mode (sys.maxunicode == 1114111).
> I suspect that I am not alone finding this behavior non-obvious
> given that a mistake in Python manual stating the contrary survived
> several releases. [1]
It was a documentation bug and you fixed it. Non-BMP characters are rare, so
few (maybe only you?) noticed the documentation bug. I consider the behaviour
as an improvment of non-BMP support of Python3.
Python is unclear about non-BMP characters: narrow build was called "ucs2" for
long time, even if it is UTF-16 (each character is encoded to one or two
UTF-16 words). Python2 accepts non-BMP characters with \U syntax, but not with
chr(). This is inconsistent and I see this as a bug. But I don't want to touch
Python2 about non-BMP characters, and the "bug" is already fixed in Python3!
> I do believe, however that a change like
> this [2] and its consequences should be better publicized.
Change made before the release of Python 3.0. Do you want to patch the "What's
new in Python 3.0?" document?
> I have not
> found any discussion of this change in PEPs or "What's new" documents.
> The closest find was a mentioning of a related issue #3280 in the 3.0
> NEWS file. [3] Since this feature will be first documented in the
> Library Reference in 3.2, I wonder if it will be appropriate to
> mention it in "What's new in 3.2"?
In my opinion, the question is more what was it not fixed in Python2. I suppose
that the answer is something ugly like "backward compatibility" or "historical
reasons" :-)
Victor
More information about the Python-Dev
mailing list