diferences between 22 and python 23
Mike C. Fletcher
mcfletch at rogers.com
Wed Dec 3 20:02:42 CET 2003
Fredrik Lundh wrote:
>>running a script that works fine in python 22 in python 23 i find something
>>unicodedecodeerror: "ascii" codec dan+t decode byte 0xed in position
>>37:ordinal not in range (128)
>>Usually major versions of python were courteus with the previous versions...
>0xED has never been a valid 7-bit ASCII character.
Sure, but Python used to accept 8-bit characters in the platform's
default encoding as part of string characters...
Most likely Enrique has a \xED somewhere in a string literal in his code
that is intended to be an i-accent-ague. That would have worked fine in
all versions of Python before 2.3, but started failing in 2.3 due to the
decision that all string literals would be converted to unicode and back
and that the default encoding for such conversions would be ASCII
(whereas previously it would most closely have been approximated by
"platform's local 256-char encoding").
PythonWin 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32.
Portions Copyright 1994-2001 Mark Hammond (mhammond at skippinet.com.au) -
see 'Help/About PythonWin' for further copyright information.
>>> print '23\xED'
So, Enrique, what you're probably looking for is this:
# -*- coding: ISO-8859-1 -*-
for latin-1, or
# -*- coding: cp1252 -*-
for Windows code-page.
You add these "magic" comments to the top of your Python source files to
tell the interpreter that you're using a particular encoding for your
Python string literals. Even if you're just using string literals to
store binary data, you'll still need to use a dummy encoding, such as
Yes it's a bit of a pain, but the decision was made, so we have to deal
with it :) . I'm assuming that somewhere in the "new in 2.3" pages is a
huge warning to the effect that this breaks lots of old code, but
Enrique can be forgiven for missing it, as I think I managed to miss it
too, all I found was this:
*Encoding declarations* - you can put a comment of the form "# -*-
coding: <encodingname> -*-" in the first or second line of a Python
source file to indicate the encoding (e.g. utf-8). (PEP 263
<http://www.python.org/peps/pep-0263.html> phase 1)
Which doesn't actually mention the breakage of code that results. True,
theoretically the code was never valid, but *lots* of people used 8-bit
encodings quite happily with earlier versions and do find their code
breaking in 2.3 because of this.
Mike C. Fletcher
Designer, VR Plumber, Coder
More information about the Python-list