On 9/4/06, <b class="gmail_sendername">Guido van Rossum</b> <<a href="mailto:firstname.lastname@example.org">email@example.com</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
In this particular case I don't care what's simpler to implement, but<br>what's most likely to do what the user expects. If on a particular box<br>most files are encoded in encoding X, and the user did whatever is<br>necessary to tell the tools that that's their preferred encoding, I
<br>want Python to honor that encoding when opening text files, unless the<br>program makes other arrangements explicitly (such as specifying an<br>explicit encoding as a parameter to open()).</blockquote><div><br>It does not strike me as accurate that on a modern computer system, a Swedish person's computer is full of ISO/Swedish encoded files and a Chinese person's computer is full of a speciifc Chinese encoding etc. Maybe that was true before the notion of variant encodings became so popular.
<br><br>But now Europeans are just as likely to use UTF-8 as a national encoding and Asians each have MANY different encodings to select from (some defined by Unicode, some national). I doubt you'll frequently guess correctly except in specialized apps where a user has very explicit control over their file encodings and doesn't depend on applications to choose. The direction over the lifetype of Python 3000 will be AWAY from national, local, locale-predictable encodings and TOWARDS global, standard encodings. Once we get to a place where Unicode encodings are dominant, a local-encodings feature will be useless. In the transition period, it will be actually harmful.
<br><br>Also, only a portion of the text data on a computer is in "documents" where the end-user has control over the encoding. There are also many, many configuration files, emails, saved web pages, chat logs etc. where the encoding was selected by someone else with a potentially different nationality.
<br><br>I would guess that "most" text files on "most" computers in any particular locale are in ASCII/utf-8. Japanese people also have hosts files and .htaccess files and INI files and log files and ... Python can't know whether it is dealing with one of these files or an end-user document.
<br><br>Of the subset of documents that actually have their encoding controlled by the local user's preferences, an increasing portion with be XML and XML documents describe their encoding explicitly. It would be wrong to use the locale to override that.
<br><br>Beyond all of that: It just seems wrong to me that I could send someone a bunch of files and a Python program and their results processing them would be different from mine, despite the fact that we run the same version of Python on the same operating system.
<br><br> Paul Prescod<br><br></div></div>