[Python-Dev] Python 3.0.1 (io-in-c)

"Martin v. Löwis" martin at v.loewis.de
Wed Jan 28 19:29:07 CET 2009


> Thanks for the explanation. It might be clearer to document this a
> little more explicitly in the docs for open() (on the basis that
> people using open() are the most likely to be naive about encodings).
> I'll see if I can come up with an appropriate doc patch.

Notice that the determination of the specific encoding used is fairly
elaborate:
- if IO is to a terminal, Python tries to determine the encoding of
  the terminal. This is mostly relevant for Windows (which uses,
  by default, the "OEM code page" in the terminal).
- if IO is to a file, Python tries to guess the "common" encoding
  for the system. On Unix, it queries the locale, and falls back
  to "ascii" if no locale is set. On Windows, it uses the "ANSI
  code page". On OSX, it uses the "system encoding".
- if IO is binary, (clearly) no encoding is used. Network IO is
  always binary.
- for file names, yet different algorithms apply. On Windows, it
  uses the Unicode API, so no need for an encoding. On Unix, it
  (again) uses the locale encoding. On OSX, it uses UTF-8
  (just to be clear: this applies to the first argument of open(),
   not to the resulting file object)

Regards,
Martin


More information about the Python-Dev mailing list