[Python-Dev] Unicode Implementation in JPython
Guido van Rossum
guido@python.org
Mon, 21 Feb 2000 15:13:19 -0500
> My feeling on the unicode proposal and its implementation is that most
> of the changes can be integrated directly into JPython without breaking
> any existing JPython code. One thing concerns me though:
>
> open("out", "wb").write(u"hello")
(Note that the file is opened in *binary* mode; in text mode, this
would write the 5 bytes or "hello".)
> This writes a 10 bytes to the file "out".
>
> I have two problems with that:
>
> 1. In java, files are always byte-based. To move from unicode chars to
> bytes some kind of encoder must always be applied. It is also strange to
> see the actual byte layout of the data, which in my "out" file seems to
> be platform dependent. Is that the case? If it is, then the
> write(u"..") strikes me as somewhat random (unknown).
>
> 2. To get this behavior under JPython, it is necessary to introduce a
> new string type which in all other aspects are equal to the existing
> string type. Only when passed to file.write should the new string type
> returned a faked representation of its memory. When a normal string is
> passed to .write, some byte representation of the string is written to
> the file. I would prefer that in jpython a unicode string is the same as
> a normal string (type("") == type(u"")).
>
> Perhaps the real reason for my dislike of this feature of the unicode
> implementation is based on my (from java) assumption that a unicode
> character is an atomic data type.
Hm, I agree that it's not a great feature. On the other hand it's
hard to decide what to do instead without breaking other corners of
the Unicode design. Could we leave this implementation-dependent?
--Guido van Rossum (home page: http://www.python.org/~guido/)