Dr. Dobb's Python-URL! - weekly Python news and links (Dec 30)

Carl Banks invalidemail at aerojockey.com
Tue Jan 4 21:38:57 EST 2005


Skip Montanaro wrote:
> I started to answer, then got confused when I read the docstrings for
> unicode.encode and unicode.decode:
[snip]


It certainly is confusing.  When I first started Unicoding, I pretty
much stuck to Aahz's rule of thumb, without understanding this details,
and still do that. But now I do undertstand it.

Although encodings are bijective (i.e., equivalent one-to-one
mappings), they are not apolar.  One side of the encoding is
arbitrarily labeled the encoded form; the other is arbitrarily labeled
the decoded form.  (This is not a relativistic system, here.)  The
encode method maps from the decoded to the encoded set.  The decode
method does the inverse.

That's it.  The only real technical difference between encode and
decode is the direction they map in.

By convention, the decoded form is a Python unicode string, and the
encoded form is the byte string.

I believe it's technically possible (but very rude) to write an
"inverse encoding", where the "encoded" form is a unicode string, and
the decoded form is UTF-8 byte string.

Also, note that there are some encodings unrelated to Unicode.  For
example, try this:

. >>> "abcd".encode("base64")
This is an encoding between two byte strings.


-- 
CARL BANKS




More information about the Python-list mailing list