[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
fuzzyman at voidspace.org.uk
Tue Jun 28 19:22:38 CEST 2011
On 28/06/2011 18:06, Terry Reedy wrote:
> On 6/28/2011 10:46 AM, Paul Moore wrote:
>> I use Windows, and come from the UK, so 99% of my text files are
>> ASCII. So the majority of my code will be unaffected. But in the
>> occasional situation where I use a £ sign, I'll get encoding errors,
> I do not understand this. With utf-8 you would never get a string
> encoding error.
I assumed he meant that files written out as utf-8 by python would then
be read in using the platform encoding (i.e. not utf-8 on Windows) by
the other applications he is inter-operating with. The error would not
be in Python but in those applications.
>> where currently things will "just work".
> As long as you only use the machine-dependent restricted character set.
Which is the situation he is describing. You do go into those details
below, and which choice is "correct" depends on which trade-off you want
For the sake of backwards compatibility we are probably stuck with the
current trade-off however - unless we deprecate using open(...) without
an explicit encoding.
All the best,
> > And the failures will be data dependent, and hence intermittent
> > (the worst type of problem).
> That is the situation now, with platform/machine dependencies added in.
> Some people share code with other machines, even locally.
>> So, in effect, you propose making the default favour writing
>> multiplatform portable code at the expense of quick and dirty scripts?
> Let us frame it another way. Should Python installations be compatible
> with other Python installations, or with the other apps on the same
> machine? Part of the purpose of Python is to cover up platform
> differences, to the extent possible (and perhaps sensible -- there is
> the argument). This was part of the purpose of writing our own io
> module instead of using the compiler stdlib. The evolution of floating
> point math has gone in the same direction. For instance, float now
> expects uniform platform-independent Python-dependent names for
> infinity and nan instead of compiler-dependent names.
> As for practicality. Notepad++ on Windows offers ANSI, utf-8 (w,w/o
> BOM), utf-16 (big/little endian). I believe that ODF documents are
> utf-8 encoded xml (compressed or not). My original claim for this
> proposal was/is that even Windows apps are moving to uft-8 and that
> someday making that the default for Python everywhere will be the
> obvious and sensible thing.
May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
More information about the Python-Dev