Is there really a default source encoding?

"Martin v. Löwis" martin at v.loewis.de
Thu Jan 23 03:15:55 EST 2003


Magnus Lie Hetland wrote:
> The 2.3a1 "what's new" document, section 3 says that iso8859-1 is the
> default source encoding, and that the encoding only affects Unicode
> string literals... But if I put non-ASCII iso8859-1 letters in my
> string literals without declaring an encoding, I get a deprecation
> warning. 

Right. The default encoding is Latin-1, for compatibility with previous 
releases. However, reliance on the default encoding is deprecated, and 
the default encoding will change to ASCII in 2.4. According to the 
guidelines for language evolution (PEP 5), this means a warning must be 
emitted in 2.3.

> What's the correct behaviour, and where is it documented?

Whose correct behaviour? I believe Python 2.3a1 behaves correctly: 
Latin-1 is the deprecated default encoding, see

http://www.python.org/dev/doc/devel/ref/lexical.html

Notice the "favours Latin-1". Strictly speaking, the default encoding 
mostly relevant only of Unicode literals. With the default behaviour, 
byte strings appear at runtime as they appear on disk in the source 
file, so they could be in nearly any encoding. So you can use also 
koi8-r (say) in your source code, and, unless you use Unicode literals, 
nothing would break.

Your correct behaviour is to properly declare the encoding of your 
source files; this is documented at the same place.

Regards,
Martin





More information about the Python-list mailing list