[ python-Bugs-995206 ] Deprecation Warning lies when it cannot parse your encoding

Wed Jul 21 19:28:08 CEST 2004

Bugs item #995206, was opened at 2004-07-21 14:48
Message generated for change (Comment added) made by lcreighton
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=995206&group_id=5470

Category: None
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Laura Creighton (lcreighton)
Assigned to: Nobody/Anonymous (nobody)
Summary: Deprecation Warning lies when it cannot parse your encoding

Initial Comment:
The first line of my python file was:
# -*- coding: 'iso-8859-1' -*-

This is wrong, I need to drop the quotes.  But Python
2.3.4 (#2, Jun 19 2004, 18:15:30) is wrong, also, when it
complains:
sys:1: DeprecationWarning: Non-ASCII character '\xf6'
in file test_stockdict.py on line 14, but no encoding
declared; see http://www.python.org/peps/pep-0263

----------------------------------------------------------------------

>Comment By: Laura Creighton (lcreighton)
Date: 2004-07-21 19:28

Message:
Logged In: YES 
user_id=376262

Hi Tim.

yes, this is part of the reason why I was opposed to
encoding directives as comments.  But be that as it
may,
' Neither matches the syntax for an 
  encoding directive' is overstated.

I simply ised Swedish as text in a file,
and I got the deprecation warning.

So I went and read the PEP 263.

In particular I read:
'''
1. Allow non-ASCII in string literals and comments, by
internally
       treating a missing encoding declaration as a
declaration of
       "iso-8859-1". This will cause arbitrary byte strings to
       correctly round-trip between step 2 and step 5 of the
       processing, and provide compatibility with Python 2.2 for
       Unicode literals that contain non-ASCII bytes.
''''

Ok, fine.  I had no clue my encoding was "iso-8859-1"
so I put that in.

This is wrong, and I am not blaming the PEP for not removing
the quotes there.  I think the PEP here is fine.

But, given that I made an error, I would expect to fail
as a different part of the PEP promised me:

    To define a source code encoding, a magic comment must
    be placed into the source files either as first or second
    line in the file:    

          #!/usr/bin/python
          # -*- coding: <encoding name> -*-

    More precisely, the first or second line must match the
regular
    expression "coding[:=]\s*([\w-_.]+)". The first group of
this
    expression is then interpreted as encoding name. If the  
    encoding is unknown to Python, an error is raised during 
    compilation.

-----------

The encoding was unknown, but no error was raised.  no warning
either.  

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2004-07-21 18:50

Message:
Logged In: YES 
user_id=11105

It is even worse - pass the string contents of this file to
compile(), and you get a MemoryError.  See SF # 979739.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-07-21 18:02

Message:
Logged In: YES 
user_id=31435

Laura, you don't *have* an encoding directive here.  You 
thought you were writing one, but it's no more an encoding 
directive than, e.g.,

# I want this to be treated as some form of Finnish.

would have been.  Neither matches the syntax for an 
encoding directive, so both are treated as comments.

You want Python to guess that your line, which matches the 
syntax of a Python comment but not the syntax of an 
encoding directive (and so is treated as a comment), was 
*intended* to be an encoding directive.  That would be 
possible, but first needs a rigorous definition of 
what "intended to look like an encoding directive" means.  
Right now it's a binary "yes or no" decision, based on whether 
one of the first two lines matches (or not) the regexp given in 
the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-21 17:44

Message:
Logged In: YES 
user_id=38388

Laura, the problem is that the parser only checks for
encoding comments of a certain style. If it doesn't
find that particular style (defined in the PEP), then
is simply goes on with the processing as if there were
no encoding comment (because there isn't ;-).

Quotes are not allowed according to the PEP, so
using them will render the comment useless.

I don't see how the parser could give you a note in
the sense of "well, this may be an encoding comment,
but it is not what I expected, please check".

Of course, patches are welcome if they don't make
the implementation more complicated :-)

----------------------------------------------------------------------

Comment By: Laura Creighton (lcreighton)
Date: 2004-07-21 17:37

Message:
Logged In: YES 
user_id=376262

We'll see if this gets me what I want, a comment to
Marc-Andrés comment.

That the parser, which cannot parse my encoding comment
ignores it and tries to continue work, may or may not be
admirable behaviour on the part of the parser.  I'd have
preferred a LookupError or a ValueError, or maybe a 
RuntimeWarning complaining that it couldn't make sense of my
encoding.

But when I get a Deprecation Warning, saying 'no encoding
declared' when there is one, and it's unusable, then that is
the wrong Warning to raise.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-21 15:20

Message:
Logged In: YES 
user_id=38388

I don't see why this should be wrong: if the parser cannot
parse the encoding comment, it simply ignores it and then
continues to work as if no encoding comment were given.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=995206&group_id=5470