[Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints)

François Pinard pinard at iro.umontreal.ca
Wed Apr 26 21:17:56 EDT 2000


"Fredrik Lundh" <effbot at telia.com> writes:

> M.-A. Lemburg <mal at lemburg.com> wrote:

> does this mean that the 8-bit string type is deprecated ???

I did not follow the implementation discussions, but from the outside, it
looks like 8-bit strings are kept and will be speedy.  People should either:

1) use them for ASCII without bothering or thinking much,

2) write Python source files in UTF-8, relying on the default encoding,

3) explicitly "cast" the encoding while converting strings to Unicode.

My guess is that 2) is not such good style, even if we know it will likely
be used in programs.  This might be convenient for speed prototyping, or
interactive use, say.  Python already has these two level of languages,
comparable to spoken English versus written English, the former being
more relaxed, and the latter more formal.  There are many things I let me
write in interactive Python that I conscientiously avoid in saved scripts.
That should probably be the case of 2).

Better would be to always use 3) for non-ASCII strings which are not
directly Unicode.  The notion of good style would almost turn invalid UTF-8
in 8-bit strings into an academical problem, and one more reason to avoid 2).

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard





More information about the Python-list mailing list