[Python-Dev] PEP 263 -- Python Source Code Encoding

M.-A. Lemburg mal@lemburg.com
Tue, 26 Feb 2002 21:15:40 +0100

Guido van Rossum wrote:
> > """
> > Python will default to Latin-1 as standard encoding if no other
> > encoding hints are given.
> > """
> I missed this.  Why not default to ASCII like any decent programming
> language does in the absence of an explicit encoding?

Jack had the same question. The simple answer is: we need this
in order to maintain backward compatibility when we move to
phase two of the implementation.

Here's the longer one:

ASCII is the standard encoding for Python keywords and identifiers. 
There is no standard source code encoding for string literals. 
Unicode literals are interpreted using 'unicode-escape' which 
is an enhanced Latin-1 with escape semantics.

This makes Latin-1 the right choice:

* Unicode literals already use it today

* As soon as we get to phase two of the implementation,
  8-bit string literals will be have to make the round trip
  raw binary -> Unicode -> raw binary and this only works
  if you make Latin-1 the default.

Marc-Andre Lemburg
CEO eGenix.com Software GmbH
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/