[Python-3000] encoding='guess' ?
Nick Coghlan
ncoghlan at gmail.com
Sun Sep 10 15:44:16 CEST 2006
Antoine Pitrou wrote:
> Hi,
>
> Let me add that 'guess' should probably be forbidden as an encoding
> parameter (instead, a separate function argument should be used as in my
> proposal).
>
> Here is a schematic example to show why :
>
> def append_text(filename, encoding):
> src = textfile(filename, "r", encoding)
> my_text = src.read()
> src.close()
> dst = textfile("textlist.txt", "r+", encoding)
> dst.seek_end(0)
> dst.write(my_text + "\n")
> dst.close()
>
> With Paul's current proposal three cases can arise :
> - "encoding" is a real encoding name like iso-8859-1 or utf-8. There
> should be no problems, since we assume this encoding has been configured
> once and for all in the application.
> - "encoding" is either "site" or "locale". This should result in the
> same value run after run, since we assume the site or locale encoding
> value has been configured once and for all.
> - "encoding" is "guess". In this case anything can happen. A possible
> occurence is that for the first file, it will result in utf-8 being
> detected (or Shift-JIS, or whatever), and for the second file it will be
> iso-8859-1. This will lead to a crash in the likely case that some
> characters in the source file can't be represented using the character
> encoding auto-detected for the destination file.
>
> Yet the append_text() function does look correct, doesn't it?
>
> We shouldn't hide a contextual encoding-detection algorithm under an
> encoding name. It leads to semantic uncertainty.
Interesting. This goes back more towards the model of "no default encoding,
but provide the right tools to make it easy for a program to choose one in the
absence of any metadata".
So perhaps there should just be an explicit function "guessencoding()" that
accepts a filename and returns a codec name. So if you want to guess, you
would do something like:
f = open(fname, 'r', string.guessencoding(fname))
The PEP's other suggestions would then be spelled something like:
f = open(fname, 'r', string.getlocaleencoding())
f = open(fname, 'r', string.getsiteencoding())
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-3000
mailing list