[Python-Dev] textwrap and unicode
M.-A. Lemburg
mal@lemburg.com
Tue, 22 Oct 2002 22:01:32 +0200
Greg Ward wrote:
> Well, my ignorance of Unicode has finally bitten me -- someone filed a
> bug (#622831) against textwrap.py because it crashes when it attempts to
> wrap a Unicode string.
>
> Here are the problems that I am aware of:
>
> * textwrap assumes "whitespace" means "the characters in
> string.whitespace"
It should use u.isspace() for this.
You might also want to consider u.splitlines() for line breaking,
since Unicode has a lot more line breaking characters than
ASCII (which u.splitlines() knows about).
> * textwrap assumes "lowercase letter" means "the characters in
> string.lowercase" (heck, this only works in English)
u.lower() will do the right thing for Unicode.
> Can someone tell me what the proper way to do this is? Or just point me
> at the relevant documentation? I've scoured the online docs and *Python
> Essential Reference*, and I know more about the codes and unicodedata
> modules than I did before. But I still don't know how to replace all
> whitespace with space, or detect words that end with a lowercase letter.
Hope that helps,
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/