Use of Unicode in Python 2.5 source code literals

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun May 3 07:32:39 EDT 2009


On Sun, 03 May 2009 03:43:27 -0700, Uncle Bruce wrote:

> Based on some experimenting I've done, I suspect that the support for
> Unicode literals in ANY encoding isn't really accurate.  What seems to
> happen is that there must be an 8-bit mapping between the set of Unicode
> literals and what can be used as literals.
> 
> Even when I set Options / General / Default Source Encoding to UTF-8,
> IDLE won't allow Unicode literals (e.g. characters copied and pasted
> from the Windows Character Map program) to be used, even in a quoted
> string, if they represent an ord value greater than 255.

When you say it "won't allow", what do you mean? That you can't paste 
them into the document? Does it give an error? An exception at compile 
time or runtime?

I assume you have included the coding line at the top of the file. Make 
sure it says utf-8 and not latin-1.
 
# -*- coding: uft-8 -*-

This is especially important if you use a Windows text editor that puts a 
Unicode BOM at the start of the file.

What happens if you use a different editor to insert the characters in 
the file, and then open it in IDLE?

How are you writing the literals? As byte strings or unicode strings? E.g.

# filename = nonascii.py
theta = 'θ'  # byte string, probably will lead to problems
sigma = u'Σ'  # unicode, this is the Right Way



> Is there a way to use more than 255 Unicode characters in source code
> literals in Python 2.5.4?

It works for me in Python 2.4 and 2.5, although I'm not using IDLE.

>>> import nonascii
>>> nonascii.sigma
>>> print nonascii.sigma
Σ
>>> print nonascii.theta
θ

Perhaps it is a problem with IDLE?



-- 
Steven



More information about the Python-list mailing list