[Python-Dev] EuroPython Language Summit report

Victor Stinner victor.stinner at haypocalc.com
Fri Jun 24 23:08:18 CEST 2011


Le vendredi 24 juin 2011 à 16:30 -0400, Terry Reedy a écrit :
> > I see two options to improve the situation.
> 
> The third is to make utf-8 the default. I believe this *is* the proper 
> long term solution and both options are contrary to this.

Oh yes, I also prefer this option, but I suspect that some people prefer
to not break backward compatibility.

Or should we consider this bad design choice as a bug?

The UTF-8 encoder (of Python 3) raises an error if the text contains a
surrogate character. The surrogatepass error handler should be used to
encode surrogages.

... The surrogateescape can be used to encode back undecodable bytes
(e.g. filename decoded by Python using the surrogateescape), but it is
not a good idea to write illegal byte sequences (most programs will fail
to open the file).

> I believe that this is what I want for myself even on Windows.

Can you open a UTF-8 files in all Windows program (and the text is
displayed correctly)? I remember that notepad.exe writes an evil UTF-8
BOM, but I don't know if it requires this BOM to detect the UTF-8
encoding.

Or do some program expect text files encoded to the ANSI code page?

If you want to write files in the ANSI code page, you can use
encoding="mbcs" (or use an explicit code page, like encoding="cp1252").

> (3) In 3.3, if the default is used and it is not utf-8, add a warning 
> that the default will become utf-8 always in 3.4. Actually, I would like 
> a PendingDeprecationWarning in 3.2.1 if possible.

I'm not sure that the "and it is not utf-8" condition is a good idea. If
you develop on Linux but your users are on Windows, you will not get the
warning (even with -Werror) nor your users (users don't want to see
warnings)... Or maybe an user using Windows and Linux will notice the
bug without the warning :-)

It doesn't mean that it is not possible to check your program: you can
change your locale encoding (e.g. use LANG=C).

At least, it will be possible to check test_distutils and test_packaging
using LANG=C and -Werror :-)

--

A fourth option is to use ASCII by default! Your program will work and
be portable until you write the first non-ASCII character... Oh wait, it
remembers me the Unicode nightmare of Python 2!

Victor



More information about the Python-Dev mailing list