What encoding does u'...' syntax use?
rNOSPAMon at flownet.com
Fri Feb 20 16:15:51 EST 2009
In article <499f18bd$0$31879$9b4e6d93 at newsspool3.arcor-online.net>,
Stefan Behnel <stefan_ml at behnel.de> wrote:
> Ron Garret wrote:
> > I would have thought that the answer would be: the default encoding
> > (duh!) But empirically this appears not to be the case:
> >>>> unicode('\xb5')
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 0:
> > ordinal not in range(128)
> >>>> u'\xb5'
> > u'\xb5'
> >>>> print u'\xb5'
> > µ
> > (That last character shows up as a micron sign despite the fact that my
> > default encoding is ascii, so it seems to me that that unicode string
> > must somehow have picked up a latin-1 encoding.)
> You are mixing up console output and internal data representation. What you
> see in the last line is what the Python interpreter makes of your unicode
> string when passing it into stdout, which in your case seems to use a
> latin-1 encoding (check your environment settings for that).
> BTW, Unicode is not an encoding. Wikipedia will tell you more.
Yes, I know that. But every concrete representation of a unicode string
has to have an encoding associated with it, including unicode strings
produced by the Python parser when it parses the ascii string "u'\xb5'"
My question is: what is that encoding? It can't be ascii. So what is
Put this another way: I would have thought that when the Python parser
parses "u'\xb5'" it would produce the same result as calling
unicode('\xb5'), but it doesn't. Instead it seems to produce the same
result as calling unicode('\xb5', 'latin-1'). But my default encoding
is not latin-1, it's ascii. So where is the Python parser getting its
encoding from? Why does parsing "u'\xb5'" not produce the same error as
More information about the Python-list