[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build
Marc-Andre Lemburg
report at bugs.python.org
Wed Nov 24 21:37:51 CET 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Alexander Belopolsky wrote:
>
> New submission from Alexander Belopolsky <belopolsky at users.sourceforge.net>:
>
>>>> 'xyz'.center(20, '\U00100140')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: The fill character must be exactly one character long
>
> str.ljust and str.rjust are similarly affected.
I don't think we should change that for the formatting methods.
See my reply on python-dev:
str.center(n) centers the string in a padded string that
is composed of n code units. Whether that operation will result
in a text that's centered visually on output is a completely
different story. The original string could contain surrogates,
it could also contain combing code points, so the visual
presentation of the result may very well not be centered at
all; it may not even appear as having the length n to the user.
Since we're not going change the semantics of those APIs,
it is OK to not support padding with non-BMP code points on
UCS-2 builds.
Supporting such cases would only cause problems:
* if the methods would pad with surrogates, the resulting
string would no longer have length n; breaking the
assumption that len(str.center(n)) == n
* if the methods would pad with half the number of surroagtes
to make sure that len(str.center(n)) == n, the resulting
output to e.g. a terminal would be further off, than what
you already have with surrogates and combining code points
in the original string.
----------
nosy: +lemburg
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10521>
_______________________________________
More information about the Python-bugs-list
mailing list