What encoding does u'...' syntax use?
"Martin v. Löwis"
martin at v.loewis.de
Sat Feb 21 00:19:43 CET 2009
>>>>> print u'\xb5'
> Unicode literals are *in the source file*, which can only have one
> encoding (for a given source file).
>> (That last character shows up as a micron sign despite the fact that
>> my default encoding is ascii, so it seems to me that that unicode
>> string must somehow have picked up a latin-1 encoding.)
> I think latin-1 was the default without a coding cookie line. (May be
> uft-8 in 3.0).
It is, but that's irrelevant for the example. In the source
all characters are ASCII (i.e. all of "letter u", "single
quote", "backslash", "letter x", "letter b", "digit 5").
As a consequence, this source text has the same meaning in all
supported source encodings (as source encodings must be ASCII
The Unicode literal shown here does not get its interpretation
from Latin-1. Instead, it directly gets its interpretation from
the Unicode coded character set. The string is a short-hand
and this denotes character U+00B5 (just as u'\u20ac" denotes
U+20AC; the same holds for any other u'\uXXXX').
More information about the Python-list