[Python-Dev] Re: [Python-checkins]
python/dist/src/Include unicodeobject.h, 2.42, 2.43
"Martin v. Löwis"
martin at v.loewis.de
Wed Jun 2 14:11:15 EDT 2004
Skip Montanaro wrote:
> Hye-Shik> 1) regard all characters as non-wide.
> Hye-Shik> 2) decode the string to unicode with the system default encoding
> Hye-Shik> and call its methods.
>
> ...
>
> Hye-Shik> I didn't make my mind between these two yet. What do you think?
>
> #1 sounds like the most reasonable to me.
That violates the rule
In the face of ambiguity, refuse the temptation to guess.
For a byte string, for "character width" to be a meaningful concept, the
byte string must use a multi-byte encoding. The, .iswide would not be
implementable because it doesn't apply to a single byte, but a sequence
of bytes. .width is unimplementable because you need to know the
encoding.
So I propose that the methods aren't added to string objects.
> You can't rely on strings coming
> into your program with proper encoding information, and they might come from
> an environment different than sys.defaultencoding (think WWW), so #2 seems
> like it would create as many problems as it solves. All I'm interested in
> is avoiding needless occurrences of these constructs in code:
>
> if isinstance(s, unicode):
> width = s.width()
> else:
> ...
If you have code that cares about character width, you need to convert
all incoming strings to Unicode. Then, you can just write
width = s.width()
If you find you are writing code like the one above, you have found a
bug in your code.
Regards,
Martin
More information about the Python-Dev
mailing list