[Python-3000] detect file encoding or always use the default, UTF-8?

Guido van Rossum guido at python.org
Tue Feb 19 19:33:03 CET 2008


Well, we're basically hoping that the folks who actually uses Python
to read and write text files containing non-ASCII characters on OSX
tell us what they want. At least that's where I am. Since I personally
still live in a nearly-ASCII world (and probably always will), my own
experience just doesn't give me any guidance as to what would be the
most useful.

How does Apple's TextEdit do the guessing (assuming it guesses at all)?

I just typed a few non-ASCII characters into a text file with TextEdit
on OSX 10.5, and it seems to have written the file in MacRoman. This
with the preferences for open and save set to automatic.

--Guido

On Feb 18, 2008 4:32 PM, Kumar McMillan <kumar.mcmillan at gmail.com> wrote:
> Hello.
>
> In Python 3, when opening a file without declaring the encoding
> keyword is Python going to guess the encoding of the file or just use
> the default?  That is, assume the file is UTF-8?
>
> What led me to wonder this is that on Mac OS X 10.4.11, opening a file
> containing UTF-8 encoded text creates a unicode object under the
> assumption the text was "mac roman" encoding.  This seems like a bug.
> It works if I set the encoding to UTF-8 explicitly.  I can post the
> code but it sounds like there are some Mac encoding issues in flux so
> I thought to ask first.
>
> thanks, Kumar
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list