[Python-3000] detect file encoding or always use the default, UTF-8?

Kumar McMillan kumar.mcmillan at gmail.com
Tue Feb 19 20:35:38 CET 2008


Hi.
I was mainly confused about the auto-detection of the encoding.  I
thought I read something earlier that Python 3 will not try to guess
encodings and that opening files will always use the sys encoding
(defaulting to UTF-8) instead of guessing.  Can't find where I read
that so I might have made it up :D  Now at least I know it will be
guessing encodings.

On Feb 19, 2008 12:33 PM, Guido van Rossum <guido at python.org> wrote:
> Well, we're basically hoping that the folks who actually uses Python
> to read and write text files containing non-ASCII characters on OSX
> tell us what they want. At least that's where I am. Since I personally
> still live in a nearly-ASCII world (and probably always will), my own
> experience just doesn't give me any guidance as to what would be the
> most useful.

I too pretty much work entirely in ascii for day to day stuff so it's
hard to say.

>
> How does Apple's TextEdit do the guessing (assuming it guesses at all)?
>
> I just typed a few non-ASCII characters into a text file with TextEdit
> on OSX 10.5, and it seems to have written the file in MacRoman. This
> with the preferences for open and save set to automatic.

Were you saving in RTF format or plain text?  TextEdit is strange in
that it makes it hard to save a plain text file so if you were working
with RTF files I'd think you'd have to open it from Python in binary
mode anyway, no?

But for plain text TextEdit doesn't guess.  I typed some non-ascii and
went to format -> Make plain text.  Then in the save dialog it
defaulted to UTF-16 and the only options (without making custom
modifications) were UTF-8 and simplified Chinese.  However, this was
on 10.4; someone checked 10.5 for me and said the default is UTF-8
(sounds like they fixed something!).


thanks, Kumar


More information about the Python-3000 mailing list