Wrong unichr docstring in 2.7

Thomas Jollans thomas at jollybox.de
Sun Aug 22 12:25:19 CEST 2010

On Sunday 22 August 2010, it occurred to jmfauth to exclaim:
> I think there is a small point here.
> >>> sys.version
> 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
> >>> print unichr.__doc__
> unichr(i) -> Unicode character
> Return a Unicode string of one character with ordinal i; 0 <= i <=
> 0x10ffff.
> >>> # but
> >>> unichr(0x10fff)
> Traceback (most recent call last):
>   File "<psi last command>", line 1, in <module>
> ValueError: unichr() arg not in range(0x10000) (narrow Python
> build)

This is very tricky ground. I consider the behaviour of unichr() to be wrong 
here. The user shouldn't have to care much about UTF-16 and the difference 
between wide and narrow Py_UNICODDE builds. In fact, in Python 3.1, this 
behaviour has changed:
on a narrow Python 3 build, chr(0x10fff) == '\ud803\udfff' == '\U00010fff'. 

Now, the Python 2 behaviour can't be fixed [1] -- it was specified in PEP 261 
[2], which means it was pretty much set in stone. Then, it was deemed more 
important for unichr() to always return a length-one string that for it to 
work with wide characters. And then add pretty half-arsed utf-16 support...

The doc string could be changed for narrow Python builds. I myself don't think 
docstrings should change depending on build options like this -- it could be 
amended to document the different behaviours here. Note that the docs [3] 
already include this information.

If you want to, feel free to report a bug at http://bugs.python.org/

> Note:
> I find
> 0x0 <= i <= 0xffff
> more logical than
> 0 <= i <= 0xffff
> (orange-apple comparaison)

Would a zero by any other name not look as small? Honestly, I myself find it 
nonsensical to qualify 0 by specifying a base, unless you go all the way and 
represent the full uint16_t by saying 0x0000 <= i <= 0xffff

 - Thomas

[1] http://bugs.python.org/issue1057588
[2] http://www.python.org/dev/peps/pep-0261/
[3] http://docs.python.org/library/functions.html#unichr

More information about the Python-list mailing list