What encoding does u'...' syntax use?

Thorsten Kampe thorsten at thorstenkampe.de
Sat Feb 21 13:24:23 EST 2009


* "Martin v. Löwis" (Sat, 21 Feb 2009 00:15:08 +0100)
> > Yes, I know that. But every concrete representation of a unicode
> > string has to have an encoding associated with it, including unicode
> > strings produced by the Python parser when it parses the ascii
> > string "u'\xb5'"
> > 
> > My question is: what is that encoding?
> 
> The internal representation is either UTF-16, or UTF-32; which one is
> a compile-time choice (i.e. when the Python interpreter is built).

I'm pretty much sure it is UCS-2 or UCS-4. (Yes, I know there is only a 
slight difference to UTF-16/UTF-32).

Thorsten



More information about the Python-list mailing list