Harri Pasanen: Re: [Idle-dev] Known bug? Saving fails in IDLE if accents used in char acters
Martin von Loewis
loewis@informatik.hu-berlin.de
Sun, 23 Mar 2003 23:33:51 +0100 (CET)
> Saving my python file, still consisting of the single line
> # d=E9j=E0
>=20
> it spits out a message box with a title "I/O Error"
> Non-ASCII found, yet no encoding declared. Add a line like=20
> #-*- coding: iso-8859-15-*-
> to your file [OK]
>=20
> After clicking OK, it seems to save my file without problems, but it=20
> bugs me with the message at each save.
Hi Harry,
There is not much we can do about this; IDLE *could* try to edit
the file for you, but I consider this too intrusive, hence the error
message.
> The message box is modal, and I cannot cut and paste the line from it.
Patches to correct this are welcome. I'm unsure what Tk widget to use
that allows copyable-but-uneditable text.
> Typing it in manually, my next I/O Error is:
>=20
> Unknown encoding iso-8859-15-.
> Saving as UTF-8
> [OK]
Are you sure there was no space between the 5 and the - in the message?
Adding a space should help.
> Now this is the greatest, as trying to run the resulting file in=20
> python gives:
>=20
> [harri@kapu harri]$ python t.py
> File "t.py", line 1
> =EF=BB=BF#-*- coding: iso-8859-15-*-
> ^
> SyntaxError: invalid syntax
Yes, this won't be a syntax error only in Python 2.3. If you correct
the problem of the encoding declaration being incorrect (by adding
the missing space), the problem will go away.
> Hmm... Perl seems to to have a pragma to enable UTF-8 in source code,=20
> but I was not aware Python would have support for UTF-8 source. Does=20
> it?
Indeed; this is the result of PEP 263. Unlike Perl, Python supports
multiple different source encoding, hence the need for an explicit
declaration.
> Now how about doing what 99% of other editors do, and supporting by=20
> default iso-8859-15 (basically latin-1), that will make get a couple=20
> of hundred million Europeans happy right there? =20
Supporting this in IDLE would be acceptable, I guess. However, in the
long run, Python itself will refuse source code that lacks a proper
encoding declaration, so I felt that IDLE should teach users how to
do that early on.
> Extending the support outside of latin alphabets is an honorable goal,=20
> but clearly what ever the encoding is should not munge with the=20
> python source code, unless python itself has support for it.
But Python does have support for it.
> The following link has some info on how Java deals with this issue:=20
> http://www.jorendorff.com/articles/unicode/java.html (basically the=20
> java compiler does support multiple charsets).
So does Python. However, the Java method is fundamentally flawed:
Whoever invokes javac needs to know what the source encoding is, and
it needs to be the same for all source code files. I find it=20
unacceptable that users of a library have to know what encoding the
library uses.
In any case, I recommend to read PEP 263.
> All my sympathy for the Japanese/Chinese/... there, how do you=20
> program in python/C/C++? I would assume externalizing the strings=20
> would be the easiest, or are there specialized editors that handle=20
> gracefully non-ASCII, non-Latin comments and strings inside ASCII=20
> source code? =20
You would think that people do that, but they don't:
a) they want to put comments into source code, in their native language,
using the encoding that their system uses.
b) they want to use non-ASCII in identifiers (not supported in Python 2.3=
,
but may be supported in the future)
c) they do put non-ASCII into string literals and Unicode literals and
expect this to work. In particular for Unicode literals, this cannot
work without an encoding declaration. If they target a single language
only, putting the text into the source is a natural thing to do; the
overhead of an external message catalogue is unacceptable.
Regards,
Martin