[Python-3000] Unicode and OS strings

Jim Jewett jimjjewett at gmail.com
Tue Sep 18 23:19:46 CEST 2007


On 9/18/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:

> There's no UTF-8 in Python's internal string encoding.  What are you
> talking about?

(At least as of a few days ago)

In Python 3 there is; strings are unicode.  A PyUnicodeObject object
has two encodings that you can grab from a pointer (which means they
have to be there; you don't have time to generate them like you would
with a function pointer).

One of these (str) is the "internal encoding" which is chosen at
compile time, and the other (defenc) is now hard-coded to UTF-8.

Hashing is also based on the UTF-8 bytestring.

-jJ


More information about the Python-3000 mailing list