[Python-Dev] Heads up: unicode file I/O in JPython.
Sat, 20 May 2000 15:19:09 GMT
I have recently released errata-07 which improves on JPython's ability
to handle unicode characters as well as binary data read from and
written to python files.
The conversions can be described as
- I/O to a file opened in binary mode will read/write the low 8-bit
of each char. Writing Unicode chars >0xFF will cause silent
- I/O to a file opened in text mode will push the character
through the default encoding for the platform (in addition to
handling CR/LF issues).
This breaks completely with python1.6a2, but I believe that it is close
to the expectations of java users. (The current JPython-1.1 behavior are
completely useless for both characters and binary data. It only barely
manage to handle 7-bit ASCII).
In JPython (with the errata) we can do:
f = open("test207.out", "w")
f.write("\x20ac") # On my w2k platform this writes 0x80 to the file.
f = open("test207.out", "r")
f = open("test207.out", "wb")
f.write("\x20ac") # On all platforms this writes 0xAC to the file.
f = open("test207.out", "rb")
With the output of:
I do not expect anything like this in CPython. I just hope that all
unicode advice given on c.l.py comes with the modifier, that JPython
might do it differently.
[*] Silent overflow is bad, but it is at least twice as fast as having
to check each char for overflow.