[Python-Dev] Re: PEP: Defining Unicode Literal Encodings (revision 1.1)

M.-A. Lemburg mal@lemburg.com
Sun, 15 Jul 2001 20:07:50 +0200

Guido van Rossum wrote:
> > > Explain again why a directive is better than a specially marked
> > > comment, when your main goal seems to be to make it easy for
> > > non-parsing tools like editors to find it?
> > >...
> >
> > Parsing tools do need it. The directive changes the file's semantics.
> > Both parsing and non-parsing tools need it.
> I understand that.
> > I could live with a comment but I think that that is actually harder to
> > implement so I don't understand the benefit...I'm still trying to
> > understand what tools we are protecting. compiler.py can be easily
> > fixed. The real parser/compiler can be easily fixed. The other tools
> > mostly take their cue from one of these two modules, right?
> I disagree with the first sentence -- I believe a comment is easier to
> implement.  The directive statement is still problematic.  Martin's
> hack falls short of doing the right thing in all cases: you can't have
> the first statement of your program be "directive = ..." or
> "directive(...)".
> Another argument for a comment: I expect there could be situations
> where you want to declare an encoding that doesn't affect the Python
> parser, but that does affect the editor (e.g. when you use the
> encoding only in comments and/or 8-bit strings).  A comment would
> back-port to older Python versions; a directive statement wouldn't.  I
> don't know how important this is though.

Even though putting the information into a comment would
indeed be easier to implement, I think that from a design point
of view, it is a hack and not a clean design.

Note that a programmer can always place the encoding information
in the format needed for the editor into an additional comment
in fron of the doc-string if that's needed (the comment format 
needed for the editor will be editor-specific !).

I think that apart from adding a new keyword to the language
the argument about breaking doc-string tools is not a valid
one. Non-Unicode doc-strings will continue to work like they
always have:

# -*- encoding='utf-8' -*-
""" Binary doc-string using UTF-8
directive unicodeencoding = 'utf-8'
print u"Unicode encoded as UTF-8 rather than unicode-escape"

Or am I missing something ?

Marc-Andre Lemburg
CEO eGenix.com Software GmbH
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/