[Idle-dev] Known bug? Saving fails in IDLE if accents used in characters

Harri Pasanen harri@nerim.net
Sun, 23 Mar 2003 17:08:59 +0100


On Sunday 23 March 2003 14:08, Guido van Rossum wrote:
> Hi Harri!
>
Hi Guido,  as you can see I'm busy raising a second generation of 
Python programmers ;-).

The rest mainly for the IDLEfork people:

I gave IDLEfork 0.9a2 a go, and it sort-of tries to deal with the 
problem.

Saving my python file, still consisting of the single line
# déjà

it spits out a message box with a title "I/O Error"
Non-ASCII found, yet no encoding declared.  Add a line like 
#-*- coding: iso-8859-15-*-
to your file [OK]

After clicking OK, it seems to save my file without problems, but it 
bugs me with the message at each save.

The message box is modal, and I cannot cut and paste the line from it.
Typing it in manually, my next I/O Error is:

Unknown encoding iso-8859-15-.
Saving as UTF-8
[OK]

Now this is the greatest, as trying to run the resulting file in 
python gives:

[harri@kapu harri]$ python t.py
  File "t.py", line 1
    #-*- coding: iso-8859-15-*-
    ^
SyntaxError: invalid syntax

Hmm... Perl seems to to have a pragma to enable UTF-8 in source code, 
but I was not aware Python would have support for UTF-8 source.  Does 
it?

Now how about doing what 99% of other editors do, and supporting by 
default iso-8859-15 (basically latin-1), that will make get a couple 
of hundred million Europeans happy right there?  I remember having to 
trouble with 7-bit ASCII only sometime in 1970's -- loong time ago.

Extending the support outside of latin alphabets is an honorable goal, 
but clearly what ever the encoding is should not munge with the 
python source code, unless python itself has support for it.
The following link has some info on how Java deals with this issue: 
http://www.jorendorff.com/articles/unicode/java.html (basically the 
java compiler does support multiple charsets).

All my sympathy for the Japanese/Chinese/... there,  how do you 
program in python/C/C++?   I would assume externalizing the strings 
would be the easiest, or are there specialized editors that handle 
gracefully non-ASCII, non-Latin comments and strings inside ASCII 
source code?  I'm curious to know.

Sorry for this long rant, just trying to be a good father... ;-)

Harri