[Python-Dev] RE: Defining Unicode Literal Encodings

John W. Baxter jwbaxter at spamcop.com
Sat Jul 14 00:01:09 EDT 2001


In article <mailman.995061384.26877.python-list at python.org>, M.-A.
Lemburg <mal at lemburg.com> wrote:

> I think that allowing one directive per file is the way to go,
> but I'm not sure about the exact position. Basically, I think it
> should go "near" the top, but not necessarily before any doc-string
> in the file.
>  
> > [Guido]
> > > Hm, then the directive would syntactically have to *precede* the
> > > docstring.  That currently doesn't work -- the docstring may only be
> > > preceded by blank lines and comments.  Lots of tools for processing
> > > docstrings already have this built into them.  Is it worth breaking
> > > them so that editors can remain stupid?
> > 
> > No.
> 
> Agreed.
> 
> Note that the PEP doesn't require the directive to be placed before the
> doc-string. That point is still open. Technically, the compiler
> will only need to know about the encoding before the first
> Unicode literal in the source file.

Is there a strong possibility that *other* directives will come along
which also "want" to be before the docstring?  [I find it unlikely, at
the moment.]

If so, and if directives happen, it would seem necessary to adjust the
docstring rules (and implementation <drat!> and tools <drat!>) to allow
directive-before-docstring.  But...

Can Unicode be used for docstrings?  A simpleminded test suggests "yes":
[localhost:/tmp] john% python
Python 2.1 (#1, 04/22/01, 11:06:25) 
[GCC Apple DevKit-based CPP 6.0alpha] on darwin1
Type "copyright", "credits" or "license" for more information.
>>> import t
>>> t.__doc__
u'I am a unicode docstring'

So it might be annoying for the file-level docstring's Unicode flavor
to differ from that of all the other Unicode strings in the file.

Horrible thought:  can the compiler revisit the unicode docstring if it
turns out that there is a directive after it which sets the unicode
flavor?  always defer compiling the unicode docstring until after it's
no longer possible to write a unicode-flavor directive (first other
unicode string, end of file...just end of file?).

Is there a need for something ugly like 
   u-big5"I am a big 5 Unicode string"
(ugh!)?  (Unrestricted location in file.)

Of those two, the latter is the more powerful, but is the power needed?

   --John (full of questions; lacking answers; unhappy with own ideas)



More information about the Python-list mailing list