[Python-3000] locale-aware strings ?

David Hopwood david.nospam.hopwood at blueyonder.co.uk
Thu Sep 7 02:46:11 CEST 2006


Jim Jewett wrote:
> On 9/4/06, David Hopwood <david.nospam.hopwood at blueyonder.co.uk> wrote:
> 
>> The issue is not simplicity of implementation; it is what will provide
>> the simplest usage model in the long term. If new files are encoded in X
>> just because most of a user's existing files are encoded in X, then
>> how is the user supposed to migrate to a different encoding? ...
> 
>> In practice, the system charset is often set to the charset that should
>> be used as a fallback *for applications that do not support Unicode*.
> 
> Are you assuming that most uses of open will be for new files,

No, I'm refusing to make the assumption that all uses will be for old
files.

My position is that there should be no default encoding (not ASCII either,
although I may differ with Paul Prescod on that point). Note that Py3K is
the only opportunity to remove the idea of a default encoding -- Python
2.5 by default opens text files as US-ASCII, so this would be an incompatible
API change.

If a programmer explicitly chooses to open files with the system encoding
(by adding an "encoding=sys.get_file_content_encoding()" argument to a
file open call), that's absolutely fine. In that case they must have
considered encoding issues for at least a few seconds. That is the best
we can do.

APIs that open files should also be designed to allow auto-detection of
the encoding based on content. This requires that the detected encoding
be returned from the file open call, so that if the file needs to be
rewritten, that can be done in the same encoding that was detected (which
is the behaviour least likely to break existing applications that may read
the same file).

> *and* that these files will not also be read by such unicode-ignorant
> applications?

I'm not making that assumption either.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>




More information about the Python-3000 mailing list