[Python-Dev] default encoding for 8-bit string literals (was Unicode
and comparisons)
M.-A. Lemburg
mal@lemburg.com
Fri, 07 Apr 2000 12:55:30 +0200
Fredrik Lundh wrote:
>
> M.-A. Lemburg wrote:
> > The UTF-8 assumption had to be made in order to get the two
> > worlds to interoperate. We could have just as well chosen
> > Latin-1, but then people currently using say a Russian
> > encoding would get upset for the same reason.
> >
> > One way or another somebody is not going to like whatever
> > we choose, I'm afraid... the simplest solution is to use
> > Unicode for all strings which contain non-ASCII characters
> > and then call .encode() as necessary.
>
> just a brief head's up:
>
> I've been playing with this a bit, and my current view is that
> the current unicode design is horridly broken when it comes
> to mixing 8-bit and 16-bit strings.
Why "horribly" ? String and Unicode mix pretty well, IMHO.
The magic auto-conversion of Unicode to UTF-8 in C APIs
using "s" or "s#" does not always do what the user expects,
but it's still better than not having Unicode objects work
with these APIs at all.
> basically, if you pass a uni-
> code string to a function slicing and dicing 8-bit strings, it
> will probably not work. and you will probably not under-
> stand why.
>
> I'm working on a proposal that I think will make things simpler
> and less magic, and far easier to understand. to appear on
> sunday.
Looking forward to it,
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/