Greg Ward wrote:
Well, my ignorance of Unicode has finally bitten me -- someone filed a bug (#622831) against textwrap.py because it crashes when it attempts to wrap a Unicode string.
Here are the problems that I am aware of:
* textwrap assumes "whitespace" means "the characters in string.whitespace"
It should use u.isspace() for this. You might also want to consider u.splitlines() for line breaking, since Unicode has a lot more line breaking characters than ASCII (which u.splitlines() knows about).
* textwrap assumes "lowercase letter" means "the characters in string.lowercase" (heck, this only works in English)
u.lower() will do the right thing for Unicode.
Can someone tell me what the proper way to do this is? Or just point me at the relevant documentation? I've scoured the online docs and *Python Essential Reference*, and I know more about the codes and unicodedata modules than I did before. But I still don't know how to replace all whitespace with space, or detect words that end with a lowercase letter.
Hope that helps, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/