[ python-Bugs-989185 ] unicode.width broken for combining characters

Mon Jul 12 06:46:44 CEST 2004

Bugs item #989185, was opened at 2004-07-12 12:59
Message generated for change (Comment added) made by perky
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=989185&group_id=5470

Category: Unicode
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthew Mueller (donut)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode.width broken for combining characters

Initial Comment:
Python 2.4a1+ (#38, Jul 11 2004, 20:36:10) 
[GCC 3.3.4 (Debian 1:3.3.4-3)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> u'\u3060'.width()
2
>>> u'\u305f\u3099'.width()
4

Width should be two in both cases.

----------------------------------------------------------------------

>Comment By: Hye-Shik Chang (perky)
Date: 2004-07-12 13:46

Message:
Logged In: YES 
user_id=55188

This sounds that we need to normalize to NFC before
evaluations for unicode.width().
So, I think we'll need to choose how to use normalization
database from width() method.

1. export normalization CAPI functions from unicodedata
module like ucnhash_CAPI and unicodeobject uses it when
width() is first called.

2. move unicode.width() to unicodedata module and use
normalization functions statically.

I would prefer 2. ;)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=989185&group_id=5470