[Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts

Stephen J. Turnbull stephen at xemacs.org
Mon Jun 10 14:40:22 CEST 2013


Yuval Greenfield writes:

 > Living in Israel - Hebrew compatibility has been the nuisance and
 > these are the encodings I had to fight:utf-8, ucs-2, utf-16, ucs-4,
 > ISO-8859-8, ISO-8859-8-I, Windows-1255.  It's plagued websites,
 > browsers, email clients, adobe photoshop and premiere, excel, word,
 > and powerpoint.

You have my sympathies.  Russia is just as bad, and Japan, well, Japan
*invented* mojibake.  When I first got here in 1990, I was *triple*
booting to deal with the charset insanity.  At least for Hebrew, the
encodings you're likely to encounter in plain text divide into two
groups (UTF-8 and ISO Hebrew-like), and the latter can probably mostly
be read with cat(1) (or DOS "type").

 > Perhaps you guys are used to more os-encoding-abiding applications
 > and value that quality.

I can't speak for others, but I live in the home country of charset
self-abuse, and have been dealing with it for more than 20 years.
Even today, *most* users here are in environments where everybody they
share files with has the same default encoding and it is *not* UTF-8
(mostly Shift JIS, aka cp932).  There's another big group (Mac users)
who do use UTF-8, plus the odd Linux/*BSD/whatever users, who mostly
default to UTF-8.  The charset issues[1] have put a fair amount of
pressure on the Mac users.

Problems are frequent, I just don't think it's a good idea for Python
to default to UTF-8 yet.

 > I just wish we can get rid of these problems for good, and
 > promoting utf-8 everywhere is one way to go about it.

I believe that attempting to promote it by making it Python's default
will have a much bigger (negative) effect on Python's popularity than
it will have a (positive) effect on UTF-8 usage.

UTF-8 *is* the future.  Only Microsoft disagrees, and that doesn't
really matter because Microsoft's plan for world domination involves
proprietary binary file formats rather than text files in a standard
encoding.  So if you use it wherever possible in your programs,
explain to your correspondents why you do that, and help them in the
(rarer and rarer) cases where it gives them a problem, you will be
doing a great service to promote UTF-8.

The problem with Python doing the same thing is that it's going to
embarrass programmers with deadlines to meet in front of their bosses,
and they won't have you, me, or Guido to hold their hands and explain
to the bosses.  Neither the programmers nor the bosses are going
*raise* their evaluations of Python in such cases.  On the other hand,
problems with conflicting defaults across systems are "business as
usual", and nobody's going to blame Python for that.


Footnotes: 
[1]  And font issues; MS Office files tend to look and print poorly on
the Mac due to differences in font rendering.



More information about the Python-ideas mailing list