[Tutor] IDLE has problem with Umlaute

dman dman@dman.ddts.net
Mon, 6 May 2002 13:23:00 -0500


--zhXaljGHf11kAtnf
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, May 06, 2002 at 06:37:43PM +0200, Gregor Lingl wrote:
=20
| We use Python2.2.
| When trying to save the following fancy program
|=20
| print "Gr=F6=DFtes Problem: Umlaute"
|=20
| from an IDLE editor window, I got the error
| messages:
=2E..
|   File "[DOESN'T MATTER]", line 154, in writefile
|     f.write(chars)
| UnicodeError: ASCII encoding error: ordinal not in range(128)
|=20
| So IDLE seems to prohibit the use of "=C4=D6=DC=E4=F6=FC=DF" .

It's not IDLE that has the problem, but computer systems in general.
The problem is rather historical in nature with characters being one
byte and US-ASCII as the de-facto standard charset (it seems EBCDIC
never took off except on IBM mainframes and terminals). =20

Since US-ASCII is really a 7-bit encoding, the ISO group decided to
take advantage of the remaining 128 characters available in one byte
and defined the iso-8859-* class of charsets.  The advantage there was
no change to programs to handle "wide" characters and allowing
ulmauts, etc, to be representable.

The latest solution is Unicode -- use 16 bit characters to allow
representing (almost) all langauge's alphabets simultaneously.  The
problem here is the characters don't fit into a byte.  Thus several
encodings to transform the characters into byte streams (namely files
and sockets) have been developed with a variety of tradeoffs.

Python supports unicode, but has a problem that can't be solved with
any sane defaults.  The problem is how to serialize (or enocde, if you
prefer) the unicode characters when a string is written to a file.
Some people want latin1, utf-8, utf-7, ucs-2, utf-16, or something
else.  The lowest common demoninator is US-ASCII, so that is the only
encoding that is supported *by default*.  To demonstrate this fire up
the interpreter on any system (windows, unix, etc) and try printing a
unicode string that is not in the us-ascii subset.

>>> print u'\xf6'

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
>>> print u'\xf6'.encode( 'latin1' )
=F6
>>>  print u'\xf6'.encode( 'utf-8' )
=C3=B6
>>>=20

Here you see the beauty (=3Dp) of the terminal I'm using right now.  It
understands latin1 (iso8859-1) just fine, but not utf-8.  Python has
no way of knowing what I want to use, so it doesn't try and guess.

| As far as I remember we had previous Python versions, which
| did NOT show this 'feature'.

Right, because they didn't support other languages ;-).

| =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
| !!!  Is there a patch to repair this problem? !!!
| =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Sure, you can modify all file reads and writes to decode/encode the
stream according to your encoding preference.

Possibly adjusting the locale settings in your OS might have an effect
on this.

Or you can use sys.setdefaultencoding() to specify which encoding you
want to be the default one.  Ex :

>>> sys.setdefaultencoding( 'latin1' )
>>> print u'\xf6'
=F6
>>>

There is a slight problem with this, though.  It might break other
software that doesn't expect this and can't properly handle a
different encoding.  The other problem could be your site.py.  On
debian the default site.py removes the name sys.setdefaultencoding
so that us users can't play in the fire.  You could modify your site's
site.py to set the encoding to your preferred value and then hope
nothing else blows up :-).

HTH,
-D

--=20

Consider what God has done:
    Who can straighten what He has made crooked?
        Ecclesiastes 7:13
=20
GnuPG key : http://dman.ddts.net/~dman/public_key.gpg


--zhXaljGHf11kAtnf
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjzWygQACgkQO8l8XBKTpRQFcwCfY9l3pk81VknIaEjZbiWv2FXT
NwwAniIAOmio67TSKQjxuXk4pwBeKFHp
=ndcl
-----END PGP SIGNATURE-----

--zhXaljGHf11kAtnf--