Re: [Python-Dev] textwrap and unicode

22 Oct 2002


      Greg Ward wrote:
...
Well, my ignorance of Unicode has finally bitten me -- someone filed a
bug (#622831) against textwrap.py because it crashes when it attempts to
wrap a Unicode string.
Here are the problems that I am aware of:
* textwrap assumes "whitespace" means "the characters in
    string.whitespace"
It should use u.isspace() for this.

You might also want to consider u.splitlines() for line breaking,
since Unicode has a lot more line breaking characters than
ASCII (which u.splitlines() knows about).
...
* textwrap assumes "lowercase letter" means "the characters in
    string.lowercase" (heck, this only works in English)
u.lower() will do the right thing for Unicode.
...
Can someone tell me what the proper way to do this is?  Or just point me
at the relevant documentation?  I've scoured the online docs and *Python
Essential Reference*, and I know more about the codes and unicodedata
modules than I did before.  But I still don't know how to replace all
whitespace with space, or detect words that end with a lowercase letter.
Hope that helps,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/

Re: [Python-Dev] textwrap and unicode

M.-A. Lemburg