[Python-3000] encoding='guess' ?

Antoine Pitrou solipsis at pitrou.net
Sun Sep 10 15:21:14 CEST 2006


Let me add that 'guess' should probably be forbidden as an encoding
parameter (instead, a separate function argument should be used as in my

Here is a schematic example to show why :

def append_text(filename, encoding):
    src = textfile(filename, "r", encoding)
    my_text = src.read()
    dst = textfile("textlist.txt", "r+", encoding)
    dst.write(my_text + "\n")

With Paul's current proposal three cases can arise :
 - "encoding" is a real encoding name like iso-8859-1 or utf-8. There
should be no problems, since we assume this encoding has been configured
once and for all in the application.
 - "encoding" is either "site" or "locale". This should result in the
same value run after run, since we assume the site or locale encoding
value has been configured once and for all.
 - "encoding" is "guess". In this case anything can happen. A possible
occurence is that for the first file, it will result in utf-8 being
detected (or Shift-JIS, or whatever), and for the second file it will be
iso-8859-1. This will lead to a crash in the likely case that some
characters in the source file can't be represented using the character
encoding auto-detected for the destination file.

Yet the append_text() function does look correct, doesn't it?

We shouldn't hide a contextual encoding-detection algorithm under an
encoding name. It leads to semantic uncertainty.



More information about the Python-3000 mailing list