[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Michael Foord fuzzyman at voidspace.org.uk
Tue Jun 28 19:22:38 CEST 2011


On 28/06/2011 18:06, Terry Reedy wrote:
> On 6/28/2011 10:46 AM, Paul Moore wrote:
>
>> I use Windows, and come from the UK, so 99% of my text files are
>> ASCII. So the majority of my code will be unaffected. But in the
>> occasional situation where I use a £ sign, I'll get encoding errors,
>
> I do not understand this. With utf-8 you would never get a string 
> encoding error.
>

I assumed he meant that files written out as utf-8 by python would then 
be read in using the platform encoding (i.e. not utf-8 on Windows) by 
the other applications he is inter-operating with. The error would not 
be in Python but in those applications.

>> where currently things will "just work".
>
> As long as you only use the machine-dependent restricted character set.
>

Which is the situation he is describing. You do go into those details 
below, and which choice is "correct" depends on which trade-off you want 
to make.

For the sake of backwards compatibility we are probably stuck with the 
current trade-off however - unless we deprecate using open(...) without 
an explicit encoding.

All the best,

Michael

> > And the failures will be data dependent, and hence intermittent
> > (the worst type of problem).
>
> That is the situation now, with platform/machine dependencies added in.
> Some people share code with other machines, even locally.
>
>> So, in effect, you propose making the default favour writing
>> multiplatform portable code at the expense of quick and dirty scripts?
>
> Let us frame it another way. Should Python installations be compatible 
> with other Python installations, or with the other apps on the same 
> machine? Part of the purpose of Python is to cover up platform 
> differences, to the extent possible (and perhaps sensible -- there is 
> the argument). This was part of the purpose of writing our own io 
> module instead of using the compiler stdlib. The evolution of floating 
> point math has gone in the same direction. For instance, float now 
> expects uniform platform-independent Python-dependent names for 
> infinity and nan instead of compiler-dependent names.
>
> As for practicality. Notepad++ on Windows offers ANSI, utf-8 (w,w/o 
> BOM), utf-16 (big/little endian). I believe that ODF documents are 
> utf-8 encoded xml (compressed or not). My original claim for this 
> proposal was/is that even Windows apps are moving to uft-8 and that 
> someday making that the default for Python everywhere will be the 
> obvious and sensible thing.
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html



More information about the Python-Dev mailing list