PEP 263 status check

"Martin v. Löwis" martin at
Fri Aug 6 10:14:01 CEST 2004

Vincent Wehren wrote:
> Here's another thought: the company I work for uses (embedded) Python as
> scripting language
> for their report writer (among other things). Users can add little scripts
> to their document templates which are used for printing database data. This
> means, there are literally hundreds of little Python scripts embeddeded
> within the document templates, which themselves are stored in whatever
> database is used as the backend. In such a case, "scan and update" when
> upgrading gets a little more complicated ;)

At the same time, it might get also more simple. If the user interface 
to edit these scripts is encoding-aware, and/or the database to store
them in is encoding-aware, an automated tool would not need to guess
what the encoding in the source is.

> | My specific question there was how the code handles the
> | combination of UTF-8 as the encoding and a non-ascii
> | character in an 8-bit string literal. Is this an error? The
> | PEP does not say so. If it isn't, what encoding will
> | it use to translate from unicode back to an 8-bit
> | encoding?
> Isn't this covered by:
>        "Embedding of differently encoded data is not allowed and will
>        result in a decoding error during compilation of the Python
>        source code."

No. It is perfectly legal to have non-ASCII data in 8-bit string
literals (aka byte string literals, aka <type 'str'>). Of course,
these non-ASCII data also need to be encoded in UTF-8. Whether UTF-8
is an 8-bit encoding, I don't know - it is more precisely described
as a multibyte encoding. At execution time, the byte string literals
then have the source encoding again, i.e. UTF-8.


More information about the Python-list mailing list