[Python-Dev] textwrap and unicode

M.-A. Lemburg mal@lemburg.com
Tue, 22 Oct 2002 22:01:32 +0200


Greg Ward wrote:
> Well, my ignorance of Unicode has finally bitten me -- someone filed a
> bug (#622831) against textwrap.py because it crashes when it attempts to
> wrap a Unicode string.
> 
> Here are the problems that I am aware of:
> 
>   * textwrap assumes "whitespace" means "the characters in
>     string.whitespace"

It should use u.isspace() for this.

You might also want to consider u.splitlines() for line breaking,
since Unicode has a lot more line breaking characters than
ASCII (which u.splitlines() knows about).

>   * textwrap assumes "lowercase letter" means "the characters in
>     string.lowercase" (heck, this only works in English)

u.lower() will do the right thing for Unicode.

> Can someone tell me what the proper way to do this is?  Or just point me
> at the relevant documentation?  I've scoured the online docs and *Python
> Essential Reference*, and I know more about the codes and unicodedata
> modules than I did before.  But I still don't know how to replace all
> whitespace with space, or detect words that end with a lowercase letter.

Hope that helps,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/