Wrong unichr docstring in 2.7
Thomas Jollans
thomas at jollybox.de
Sun Aug 22 06:25:19 EDT 2010
On Sunday 22 August 2010, it occurred to jmfauth to exclaim:
> I think there is a small point here.
>
> >>> sys.version
>
> 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
>
> >>> print unichr.__doc__
>
> unichr(i) -> Unicode character
>
> Return a Unicode string of one character with ordinal i; 0 <= i <=
> 0x10ffff.
>
> >>> # but
> >>> unichr(0x10fff)
>
> Traceback (most recent call last):
> File "<psi last command>", line 1, in <module>
> ValueError: unichr() arg not in range(0x10000) (narrow Python
> build)
This is very tricky ground. I consider the behaviour of unichr() to be wrong
here. The user shouldn't have to care much about UTF-16 and the difference
between wide and narrow Py_UNICODDE builds. In fact, in Python 3.1, this
behaviour has changed:
on a narrow Python 3 build, chr(0x10fff) == '\ud803\udfff' == '\U00010fff'.
Now, the Python 2 behaviour can't be fixed [1] -- it was specified in PEP 261
[2], which means it was pretty much set in stone. Then, it was deemed more
important for unichr() to always return a length-one string that for it to
work with wide characters. And then add pretty half-arsed utf-16 support...
The doc string could be changed for narrow Python builds. I myself don't think
docstrings should change depending on build options like this -- it could be
amended to document the different behaviours here. Note that the docs [3]
already include this information.
If you want to, feel free to report a bug at http://bugs.python.org/
> Note:
>
> I find
> 0x0 <= i <= 0xffff
> more logical than
> 0 <= i <= 0xffff
>
> (orange-apple comparaison)
Would a zero by any other name not look as small? Honestly, I myself find it
nonsensical to qualify 0 by specifying a base, unless you go all the way and
represent the full uint16_t by saying 0x0000 <= i <= 0xffff
- Thomas
[1] http://bugs.python.org/issue1057588
[2] http://www.python.org/dev/peps/pep-0261/
[3] http://docs.python.org/library/functions.html#unichr
More information about the Python-list
mailing list