What encoding does u'...' syntax use?
Ron Garret
rNOSPAMon at flownet.com
Fri Feb 20 19:10:15 EST 2009
In article <499F3A8F.9010200 at v.loewis.de>,
"Martin v. Löwis" <martin at v.loewis.de> wrote:
> >>>>> u'\xb5'
> >> u'\xb5'
> >>>>> print u'\xb5'
> >> ?
> >
> > Unicode literals are *in the source file*, which can only have one
> > encoding (for a given source file).
> >
> >> (That last character shows up as a micron sign despite the fact that
> >> my default encoding is ascii, so it seems to me that that unicode
> >> string must somehow have picked up a latin-1 encoding.)
> >
> > I think latin-1 was the default without a coding cookie line. (May be
> > uft-8 in 3.0).
>
> It is, but that's irrelevant for the example. In the source
>
> u'\xb5'
>
> all characters are ASCII (i.e. all of "letter u", "single
> quote", "backslash", "letter x", "letter b", "digit 5").
> As a consequence, this source text has the same meaning in all
> supported source encodings (as source encodings must be ASCII
> supersets).
>
> The Unicode literal shown here does not get its interpretation
> from Latin-1. Instead, it directly gets its interpretation from
> the Unicode coded character set. The string is a short-hand
> for
>
> u'\u00b5'
>
> and this denotes character U+00B5 (just as u'\u20ac" denotes
> U+20AC; the same holds for any other u'\uXXXX').
>
> HTH,
> Martin
Ah, that makes sense. Thanks!
rg
More information about the Python-list
mailing list