[ python-Bugs-898757 ] Python 2.3 encoding parsing bug
SourceForge.net
noreply at sourceforge.net
Tue Feb 17 17:59:31 EST 2004
Bugs item #898757, was opened at 2004-02-17 14:36
Message generated for change (Comment added) made by edream
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=898757&group_id=5470
Category: Parser/Compiler
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Edward K. Ream (edream)
Assigned to: Nobody/Anonymous (nobody)
Summary: Python 2.3 encoding parsing bug
Initial Comment:
The documentation for encoding lines at
C:\Python23\Doc\Python-Docs-2.3.1\whatsnew\section-
encodings.html
states:
"Encodings are declared by including a specially
formatted comment in the first or second line of the
source file."
In fact, contrary to the implication, the Python 2.3
parser does not look for lines of the form:
# -*- coding: <encoding> -*-
For example, Python improperly scans the following line
for an encoding
#@+leo-ver=4-encoding=iso-8859-1.
and reports that iso-8859-1. (note trailing dot) is an
invalid encoding!
The workaround for my app is to precede this line with
the following line:
# -*- coding: iso-8859-1 -*-
This makes Python 2.3 happy.
To make myself perfectly clear: Python has absolutely
no right to complain about comment lines that do not
have the form:
# -*- coding: <encoding> -*-
Python 2.3.1
Windows XP
Edward K. Ream
edreamleo at charter.net
----------------------------------------------------------------------
>Comment By: Edward K. Ream (edream)
Date: 2004-02-17 22:59
Message:
Logged In: YES
user_id=14056
> Does leo need the trailing dot in the comment?
In general, Leo needs to know where the encoding
specification ends and a possible end-block-comment delim
begin. In specific languages, and in particular Python, Leo
would not have needed the trailing dot. Alas, this is a moot
point. The only options available to Leo now are:
1. Have the user insert encoding comments by hand or
2. Change the format of files created by Leo.
In other words, no previous 4.x version of Leo (including 4.1
final, due tomorrow) can ever work with Python 2.3 without
the user inserting a workaround.
I am most upset that the Pep said one thing in English and
something almost completely different in the re. Furthermore,
what the re implies is a very bad idea: having a _restricted_
kind of special-purpose comment is one thing: having a way-
too-general kind of special-purpose comment is wrong, wrong,
wrong. It needlessly invalidates comments that _should_
have been none of Python's business. Yes, I know there was
a reason for this bad idea; there always is.
Edward
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2004-02-17 21:47
Message:
Logged In: YES
user_id=21627
Actually, what Python should (and does) really do is to
follow the language specification (the PEP becomes
irrelevant once implemented):
http://www.python.org/doc/current/ref/encodings.html
This gives the precise regexp that is used.
Differences between the language spec and the implementation
would be considered as a bug. Closing this report as not-a-bug.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2004-02-17 21:14
Message:
Logged In: YES
user_id=38388
Python is behaving correctly and according to the PEP.
The encoding declaration parser will look for "coding[:=][
\t]*<encoding>"
to make it play nice with various different editor encoding
comments
in use today. The format you are quoting is Emacs-style, but
there are also vi-style and various other formats. Most of them
use the "coding[:=]" declaration which is why this parsing
method
was chosen.
Does leo need the trailing dot in the comment ?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=898757&group_id=5470
More information about the Python-bugs-list
mailing list