[Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts

Steven D'Aprano steve at pearwood.info
Sun Jun 9 03:40:53 CEST 2013


On 09/06/13 05:43, MRAB wrote:
> On 08/06/2013 19:02, Stephen J. Turnbull wrote:
>> MRAB writes:
>>
>>   > 'open' defaults to universal newline support when opening for
>>   > reading (though that's not possible when opening for writing!), and
>>   > it would be nice if it also defaulted to a 'universal' encoding,
>>   > i.e. UTF-8.
>>
>> There's no such thing as a universal encoding.  Unicode is a
>> universal character set in the sense that it can encode all
>> characters, but there is no universal encoding that can be used to
>> decode all texts.
>>
> I didn't say "universal encoding", I said "'universal' encoding". :-)
>
> What I meant was that I'd prefer it to default to an encoding that was
> the same on all platforms, not whatever encoding _this_ machine happens
> to be using, which might be different from whatever encoding _that_
> machine happens to be using.
>
> Or, in summary, I think that portability is more important.

Oh, I *dream* of the day when everyone everywhere standardizes on UTF-8 for storage of text data. As a Linux user, I'm partly there -- the locale on most Linux systems default to UTF-8, and most apps honour that. But so long as Windows machines normally default to some legacy encoding, as I believe they do, Python cannot afford to force the issue.

It's unfortunate when Python cannot trivially[1] read text files created on a Windows 8 box using (say) ISO-8859-7 (Greek) and then poorly transferred to another machine using (say) UTF-8. But it would be unacceptable if Python could not trivially read files *on the same machine* if they happened to have been created by some other app.


>> If the OS's default encoding is not UTF-8, then you can and should bet
>> that most files on that system will not be in UTF-8.  That's still
>> true today.  Few users will be made happy by a Python that forces them
>> to do something special to read files in the default encoding.
>>
> It would be the default encoding only for the machine on which it was
> created. If I moved the file to another machine, however, I could get
> mojibake.

How is Python supposed to know which files were created on the same machine, and which came from somewhere else?




[1] By trivially, I mean without having to worry about encoding issues.



-- 
Steven


More information about the Python-ideas mailing list