[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

Marc-Andre Lemburg report at bugs.python.org
Wed Nov 24 21:37:51 CET 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Alexander Belopolsky wrote:
> 
> New submission from Alexander Belopolsky <belopolsky at users.sourceforge.net>:
> 
>>>> 'xyz'.center(20, '\U00100140')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: The fill character must be exactly one character long
> 
> str.ljust and str.rjust are similarly affected.

I don't think we should change that for the formatting methods.

See my reply on python-dev:

str.center(n) centers the string in a padded string that
is composed of n code units. Whether that operation will result
in a text that's centered visually on output is a completely
different story. The original string could contain surrogates,
it could also contain combing code points, so the visual
presentation of the result may very well not be centered at
all; it may not even appear as having the length n to the user.

Since we're not going change the semantics of those APIs,
it is OK to not support padding with non-BMP code points on
UCS-2 builds.

Supporting such cases would only cause problems:

* if the methods would pad with surrogates, the resulting
  string would no longer have length n; breaking the
  assumption that len(str.center(n)) == n

* if the methods would pad with half the number of surroagtes
  to make sure that len(str.center(n)) == n, the resulting
  output to e.g. a terminal would be further off, than what
  you already have with surrogates and combining code points
  in the original string.

----------
nosy: +lemburg

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10521>
_______________________________________


More information about the Python-bugs-list mailing list