[Python-Dev] Why can't I encode/decode base64 without importing a module?

Lennart Regebro regebro at gmail.com
Thu Apr 25 04:19:36 CEST 2013


On Thu, Apr 25, 2013 at 3:54 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> RFC 4648 repeatedly refers to *characters*, without specifying an
> encoding for them.  In fact, if you copy accurately, you can write
> BASE64 on a napkin and that napkin will accurate transmit the data
> (assuming it doesn't run into sleet or gloom of night).

Or Mrs Cake.

> What else is that but "text in the sense of Py3k"?

Text in the sense of Py3k is Unicode. That a 8-bit character stream
(or in this case 6-bit) fits in the 31 bit character space of Unicode
doesn't make it Unicode, and hence not text. (Napkins of course have
even higher bit density than 31 bits per character, unless you write
very small). From the viewpoint of Py3k, bytes data is not text.

This is a very useful way to deal with Unicode. See also
http://regebro.wordpress.com/2011/03/23/unconfusing-unicode-what-is-unicode/

> My point is not that Python's base64 codec *should* be bytes-to-str
> and back.

Base64 does not convert between a Unicode character stream and an
8-bite byte stream. It converts between a 8-bit byte-stream and an
8-bit byte stream. It therefore should be bytes to bytes. To fit
Unicode text into Base64 you have to first use an encoding on that
Unicode text to convert it to bytes.

> What I'm groping toward is an idea of a "variable method", so that we
> could use .encode and .decode where they are TOOWTDI for people even
> though a purely formal interpretation of duck-typing would say "but
> why is that blue whale quacking, waddling, and flying?"  In other
> words (although I have no idea how best to implement it), I would like
> "somestring.encode('base64')" to fail with "I don't know how to do
> that" (an attribute lookup error?), the same way that
> "somebytes.encode('utf-8')" does in Python 3 today.

There's only two options there. Either you get a "LookupError: unknown
encoding: base64", which is what you get now, or you get an
UnicodeEncodingError if the text is not ASCII. We don't want the
latter, because it means that code that looks fine for the developer
breaks in real life because the developer was American and didn't
think of this, but his client happens to have an accent in the name.

Base64 is an encoding that transforms between 8-bit streams. Let it be
that. Don't try to shoehorn it into a completely different kind of
encoding.

//Lennart


More information about the Python-Dev mailing list