[I18n-sig] Re: [Python-Dev] Unicode debate

Guido van Rossum guido@python.org
Mon, 01 May 2000 14:02:32 -0400


[Guido]
> > And this is exactly why encodings will remain important: entities
> > encoded in ISO-2022-JP have no compelling reason to be recoded
> > permanently into ISO10646, and there are lots of forces that make it
> > convenient to keep it encoded in ISO-2022-JP (like existing tools).

[Paul]
> You cannot recode an ISO-2022-JP document into ISO10646 because 10646 is
> a character *set* and not an encoding. ISO-2022-JP says how you should
> represent characters in terms of bits and bytes. ISO10646 defines a
> mapping from integers to characters.

OK.  I really meant recoding in UTF-8 -- I maintain that there are
lots of forces that prevent recoding most ISO-2022-JP documents in
UTF-8.

> They are both important, but separate. I think that this automagical
> re-encoding conflates them.

Who is proposing any automagical re-encoding?

Are you sure you understand what we are arguing about?

*I* am not even sure what we are arguing about.

I am simply saying that 8-bit strings (literals or otherwise) in
Python have always been able to contain encoded strings.

Earlier, you quoted some reference documentation that defines 8-bit
strings as containing characters.  That's taken out of context -- this
was written in a time when there was (for most people anyway) no
difference between characters and bytes, and I really meant bytes.
There's plenty of use of 8-bit Python strings for non-character uses
so your "proof" that 8-bit strings should contain "characters"
according to your definition is invalid.

--Guido van Rossum (home page: http://www.python.org/~guido/)