[Python-Dev] Heads up: unicode file I/O in JPython.

Finn Bock bckfnn@worldonline.dk
Sat, 20 May 2000 15:19:09 GMT


I have recently released errata-07 which improves on JPython's ability
to handle unicode characters as well as binary data read from and
written to python files.

The conversions can be described as

- I/O to a file opened in binary mode will read/write the low 8-bit 
  of each char. Writing Unicode chars >0xFF will cause silent
  truncation [*].

- I/O to a file opened in text mode will push the character 
  through the default encoding for the platform (in addition to 
  handling CR/LF issues).

This breaks completely with python1.6a2, but I believe that it is close
to the expectations of java users. (The current JPython-1.1 behavior are
completely useless for both characters and binary data. It only barely
manage to handle 7-bit ASCII).

In JPython (with the errata) we can do:

  f = open("test207.out", "w")
  f.write("\x20ac") # On my w2k platform this writes 0x80 to the file.
  f.close()

  f = open("test207.out", "r")
  print hex(ord(f.read()))
  f.close()

  f = open("test207.out", "wb")
  f.write("\x20ac") # On all platforms this writes 0xAC to the file.
  f.close()

  f = open("test207.out", "rb")
  print hex(ord(f.read()))
  f.close()

With the output of:

  0x20ac
  0xac

I do not expect anything like this in CPython. I just hope that all
unicode advice given on c.l.py comes with the modifier, that JPython
might do it differently.

regards,
finn

    http://sourceforge.net/project/filelist.php?group_id=1842

[*] Silent overflow is bad, but it is at least twice as fast as having
to check each char for overflow.