PEP: Defining Unicode Literal Encodings (revision 1.1)
M.-A. Lemburg
mal at lemburg.com
Sat Jul 14 07:32:10 EDT 2001
Skip Montanaro wrote:
>
> mal> Here's an updated version which clarifies some issues...
> ...
> mal> I propose to make the Unicode literal encodings (both standard
> mal> and raw) a per-source file option which can be set using the
> mal> "directive" statement proposed in PEP 244 in a slightly
> mal> extended form (by adding the '=' between the directive name and
> mal> it's value).
>
> I think you need to motivate the need for a different syntax than is defined
> in PEP 244. I didn't see any obvious reason why the '=' is required.
I'm not picky about the '='; if people don't want it, I'll
happily drop it from the PEP. The only reason I think it may be
worthwhile adding it is because it simply looks right:
directive unicodeencoding = 'latin-1'
rather than
directive unicodeencoding 'latin-1'
(Note that internally this will set a flag to a value, so the
assigning character of '=' seems to fit in nicely.)
> Also, how do you propose to address /F's objections, particularly that the
> directive can't syntactically appear before the module's docstring (where it
> makes sense that the module author would logically want to use a non-default
> encoding)?
Guido hinted to the problem of breaking code, Tim objected
to requiring this.
I don't see the need to use Unicode literals
as module doc-strings, so I think the problem is not a real one
(8-bit strings can be written using any encoding just like you can
now).
Still, if people would like to use Unicode literals for module
doc-strings, then they should place the directive *before* the
doc-string accepting that this could break some tools (the PEP currently
does not restrict the placement of the directive). Alternatively,
we could allow placing the directive into a comment, e.g.
#!/usr/local/python
#directive unicodeencoding = 'utf-8'
u"""
This is a Unicode doc-string
"""
About Fredrik's idea that the source code should only use one
encoding:
Well, that's possible with the proposed directive, since
only Unicode literals carry data for Python is encoding-aware
and all other parts are under the programmer's control, e.g.
#!/usr/local/python
""" Module Docs...
"""
directive unicodeencoding = 'latin-1'
...
u = "Héllô Wörld !"
...
will give you pretty much what Fredrik asked for.
Note that since Python does not assign encoding information to
8-bit strings, comments etc. the only parts in a Python program
for which the programmer must explicitly tell Python which
encoding to assume are the Unicode literals.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/
More information about the Python-list
mailing list