[Python-Dev] Re: [Python-checkins] python/dist/src/Include unicodeobject.h, 2.42, 2.43

"Martin v. Löwis" martin at v.loewis.de
Wed Jun 2 14:11:15 EDT 2004


Skip Montanaro wrote:
>     Hye-Shik> 1) regard all characters as non-wide.
>     Hye-Shik> 2) decode the string to unicode with the system default encoding
>     Hye-Shik>    and call its methods.
> 
>     ...
> 
>     Hye-Shik> I didn't make my mind between these two yet.  What do you think?
> 
> #1 sounds like the most reasonable to me.  

That violates the rule

In the face of ambiguity, refuse the temptation to guess.

For a byte string, for "character width" to be a meaningful concept, the
byte string must use a multi-byte encoding. The, .iswide would not be
implementable because it doesn't apply to a single byte, but a sequence
of bytes. .width is unimplementable because you need to know the
encoding.

So I propose that the methods aren't added to string objects.

 > You can't rely on strings coming
> into your program with proper encoding information, and they might come from
> an environment different than sys.defaultencoding (think WWW), so #2 seems
> like it would create as many problems as it solves.  All I'm interested in
> is avoiding needless occurrences of these constructs in code:
> 
>     if isinstance(s, unicode):
>         width = s.width()
>     else:
>         ...

If you have code that cares about character width, you need to convert
all incoming strings to Unicode. Then, you can just write

   width = s.width()

If you find you are writing code like the one above, you have found a
bug in your code.

Regards,
Martin




More information about the Python-Dev mailing list