[Python-Dev] Divorcing str and unicode (no more implicitconversions).
M.-A. Lemburg
mal at egenix.com
Tue Oct 25 13:31:50 CEST 2005
Fredrik Lundh wrote:
> M.-A. Lemburg wrote:
>
>
>>I don't follow you here. The source code encoding
>>is only applied to Unicode literals (you are using string
>>literals in your example). String literals are passed
>>through as-is.
>
>
> however, for Python 3000, it would be nice if the source-code encoding applied
> to the *entire* file (XML-style), rather than just unicode string literals and (hope-
> fully) comments and docstrings.
Actually, the encoding is applied to the complete source file:
the file is transcoded into UTF-8 and then parsed by the
Python parser.
Unicode literals are then decoded from the UTF-8 into Unicode.
String literals are transcoded back into the source code encoding,
thus making the (rather long due to technical constraints) round-trip
source code encoding -> Unicode -> UTF-8 -> Unicode -> source code encoding.
Python 3k should have a fully Unicode based parser to reduce this
additional transcoding overhead.
Since Py3k will only have Unicode literals, the problems with
string literals will go away all by themselves :-)
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list