[Python-Dev] Allowing u.encode() to return non-strings
tim.peters at gmail.com
Mon Jun 28 20:53:32 EDT 2004
> Tim, do I understand then that Unicode strings have an implicit
> character encoding, but non-Unicode strings do not?
An 8-bit string is a sequence of 8-bit bytes. If those bytes are to
"mean something", you have to supply the meaning, or use them in a
context that supplies a specific meaning for you. This seems nearly
impossible for an American to understand, but non-Americans appear to
know it at birth (if not earlier).
A Unicode string is, at least in theory, a sequence of Unicode
characters, the latter defined in excruciating detail by the Unicode
Consortium. There's no conventional sense in which a Unicode string
is an encoding of something other than exactly itself, but you could
certainly make one up.
More information about the Python-Dev