[Python-Dev] Re: [Python-checkins] python/dist/src/Lib textwrap.py,1.18,1.19

Martin v. Löwis martin@v.loewis.de
11 Dec 2002 17:53:56 +0100


Greg Ward <gward@python.net> writes:

> My attitude is that textwrap should work on European languages, whether
> they are encoded in 8-bit "ASCII" or Unicode.  

Please, don't assume any specific encoding. Why is Latin-1 better than
KOI8-R? The only encoding that is truly better than all others is
ASCII, since virtually all other encodings have ASCII as a subset
(except for the EBCDIC ones, and, with limitations, the ISO-2022
ones).

Also, you'll find more-and-more European languages encoded in UTF-8,
so your support would be useless and give wrong results.

[If you meant to suggest no specific processing for &nbsp; disregard
this comment]

> I suspect that passing an arbitrary Unicode string to it is
> meaningles -- what the heck does it even mean to wrap a string of
> Chinese or Hebrew or Devangari characters?  Beats me, and I think
> they're out of scope for textwrap.

Actually, the Unicode database has "line-breaking properties". Those
are not yet incorporated into unicodedata, but that could be used to
meaningfully extend the module to Unicode.

> So: do I even need to worry about the cornucopia of Unicode whitespace
> characters at all?  Or can I sweep that can of worms under the rug?
> (Pardon the horribly mixed metaphor.)

Sweep away.

Regards,
Martin