[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Paul Moore p.f.moore at gmail.com
Tue Jun 28 22:11:50 CEST 2011

On 28 June 2011 18:22, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> On 28/06/2011 18:06, Terry Reedy wrote:
>> On 6/28/2011 10:46 AM, Paul Moore wrote:
>>> I use Windows, and come from the UK, so 99% of my text files are
>>> ASCII. So the majority of my code will be unaffected. But in the
>>> occasional situation where I use a £ sign, I'll get encoding errors,
>> I do not understand this. With utf-8 you would never get a string encoding
>> error.
> I assumed he meant that files written out as utf-8 by python would then be
> read in using the platform encoding (i.e. not utf-8 on Windows) by the other
> applications he is inter-operating with. The error would not be in Python
> but in those applications.

That is correct. Or files written out (as platform encoding) by other
applications, will later be read in as UTF-8 by Python, and be seen as
incorrect characters, or worse raise decoding errors. (Sorry, in my
original post I said "encoding" where I meant "decoding"...)

I'm not interested in allocating "blame" for the "error". I'm not
convinced that it *is* an error, merely 2 programs with incompatible
assumptions. What I'm saying is that compatibility between various
programs on a single machine can, in some circumstances, be more
important than compatibility between (the same, or different) programs
running on different machines or OSes. And that I, personally, am in
that situation.

>>> where currently things will "just work".
>> As long as you only use the machine-dependent restricted character set.
> Which is the situation he is describing. You do go into those details below,
> and which choice is "correct" depends on which trade-off you want to make.
> For the sake of backwards compatibility we are probably stuck with the
> current trade-off however - unless we deprecate using open(...) without an
> explicit encoding.

Backward compatibility is another relevant point. But other than that,
it's a design trade-off, agreed. All I'm saying is that I see the
current situation (which is in favour of quick script use and beginner
friendly at the expense of conceptual correctness and forcing the user
to think about his choices) as being preferable (and arguably more
"Pythonic", in the sense that I see it as a case of "practicality
beats purity" - although it's easy to argue that "in the face of
ambiguity..." also applies here :-))


More information about the Python-Dev mailing list