[ python-Bugs-995206 ] Deprecation Warning lies when it cannot
parse your encoding
SourceForge.net
noreply at sourceforge.net
Wed Jul 21 19:28:08 CEST 2004
Bugs item #995206, was opened at 2004-07-21 14:48
Message generated for change (Comment added) made by lcreighton
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=995206&group_id=5470
Category: None
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Laura Creighton (lcreighton)
Assigned to: Nobody/Anonymous (nobody)
Summary: Deprecation Warning lies when it cannot parse your encoding
Initial Comment:
The first line of my python file was:
# -*- coding: 'iso-8859-1' -*-
This is wrong, I need to drop the quotes. But Python
2.3.4 (#2, Jun 19 2004, 18:15:30) is wrong, also, when it
complains:
sys:1: DeprecationWarning: Non-ASCII character '\xf6'
in file test_stockdict.py on line 14, but no encoding
declared; see http://www.python.org/peps/pep-0263
----------------------------------------------------------------------
>Comment By: Laura Creighton (lcreighton)
Date: 2004-07-21 19:28
Message:
Logged In: YES
user_id=376262
Hi Tim.
yes, this is part of the reason why I was opposed to
encoding directives as comments. But be that as it
may,
' Neither matches the syntax for an
encoding directive' is overstated.
I simply ised Swedish as text in a file,
and I got the deprecation warning.
So I went and read the PEP 263.
In particular I read:
'''
1. Allow non-ASCII in string literals and comments, by
internally
treating a missing encoding declaration as a
declaration of
"iso-8859-1". This will cause arbitrary byte strings to
correctly round-trip between step 2 and step 5 of the
processing, and provide compatibility with Python 2.2 for
Unicode literals that contain non-ASCII bytes.
''''
Ok, fine. I had no clue my encoding was "iso-8859-1"
so I put that in.
This is wrong, and I am not blaming the PEP for not removing
the quotes there. I think the PEP here is fine.
But, given that I made an error, I would expect to fail
as a different part of the PEP promised me:
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file:
#!/usr/bin/python
# -*- coding: <encoding name> -*-
More precisely, the first or second line must match the
regular
expression "coding[:=]\s*([\w-_.]+)". The first group of
this
expression is then interpreted as encoding name. If the
encoding is unknown to Python, an error is raised during
compilation.
-----------
The encoding was unknown, but no error was raised. no warning
either.
----------------------------------------------------------------------
Comment By: Thomas Heller (theller)
Date: 2004-07-21 18:50
Message:
Logged In: YES
user_id=11105
It is even worse - pass the string contents of this file to
compile(), and you get a MemoryError. See SF # 979739.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-07-21 18:02
Message:
Logged In: YES
user_id=31435
Laura, you don't *have* an encoding directive here. You
thought you were writing one, but it's no more an encoding
directive than, e.g.,
# I want this to be treated as some form of Finnish.
would have been. Neither matches the syntax for an
encoding directive, so both are treated as comments.
You want Python to guess that your line, which matches the
syntax of a Python comment but not the syntax of an
encoding directive (and so is treated as a comment), was
*intended* to be an encoding directive. That would be
possible, but first needs a rigorous definition of
what "intended to look like an encoding directive" means.
Right now it's a binary "yes or no" decision, based on whether
one of the first two lines matches (or not) the regexp given in
the PEP.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-21 17:44
Message:
Logged In: YES
user_id=38388
Laura, the problem is that the parser only checks for
encoding comments of a certain style. If it doesn't
find that particular style (defined in the PEP), then
is simply goes on with the processing as if there were
no encoding comment (because there isn't ;-).
Quotes are not allowed according to the PEP, so
using them will render the comment useless.
I don't see how the parser could give you a note in
the sense of "well, this may be an encoding comment,
but it is not what I expected, please check".
Of course, patches are welcome if they don't make
the implementation more complicated :-)
----------------------------------------------------------------------
Comment By: Laura Creighton (lcreighton)
Date: 2004-07-21 17:37
Message:
Logged In: YES
user_id=376262
We'll see if this gets me what I want, a comment to
Marc-Andrés comment.
That the parser, which cannot parse my encoding comment
ignores it and tries to continue work, may or may not be
admirable behaviour on the part of the parser. I'd have
preferred a LookupError or a ValueError, or maybe a
RuntimeWarning complaining that it couldn't make sense of my
encoding.
But when I get a Deprecation Warning, saying 'no encoding
declared' when there is one, and it's unusable, then that is
the wrong Warning to raise.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-21 15:20
Message:
Logged In: YES
user_id=38388
I don't see why this should be wrong: if the parser cannot
parse the encoding comment, it simply ignores it and then
continues to work as if no encoding comment were given.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=995206&group_id=5470
More information about the Python-bugs-list
mailing list