Unicode Implementation in JPython
Hi, My feeling on the unicode proposal and its implementation is that most of the changes can be integrated directly into JPython without breaking any existing JPython code. One thing concerns me though: open("out", "wb").write(u"hello") This writes a 10 bytes to the file "out". I have two problems with that: 1. In java, files are always byte-based. To move from unicode chars to bytes some kind of encoder must always be applied. It is also strange to see the actual byte layout of the data, which in my "out" file seems to be platform dependent. Is that the case? If it is, then the write(u"..") strikes me as somewhat random (unknown). 2. To get this behavior under JPython, it is necessary to introduce a new string type which in all other aspects are equal to the existing string type. Only when passed to file.write should the new string type returned a faked representation of its memory. When a normal string is passed to .write, some byte representation of the string is written to the file. I would prefer that in jpython a unicode string is the same as a normal string (type("") == type(u"")). Perhaps the real reason for my dislike of this feature of the unicode implementation is based on my (from java) assumption that a unicode character is an atomic data type. regards, finn
participants (2)
-
bckfnn@worldonline.dk
-
Guido van Rossum