[Python-Dev] Why can't I encode/decode base64 without importing a module?

Stephen J. Turnbull stephen at xemacs.org
Thu Apr 25 03:54:06 CEST 2013


Tres Seaver writes:

 > On 04/23/2013 09:29 AM, Stephen J. Turnbull wrote:
 > > By RFC specification, BASE64 is a *textual* representation of
 > > arbitrary binary data.
 > 
 > It isn't "text" in the sense Py3k means:

RFC 4648 repeatedly refers to *characters*, without specifying an
encoding for them.  In fact, if you copy accurately, you can write
BASE64 on a napkin and that napkin will accurate transmit the data
(assuming it doesn't run into sleet or gloom of night).  What else is
that but "text in the sense of Py3k"?

My point is not that Python's base64 codec *should* be bytes-to-str
and back.  My point is that, both in the formal spec and in historical
evolution, that is a plausible interpretation of ".encode('base64')"
which happens to be the reverse of the normal codec convention, where
".encode(codec)" is a *string* method, and ".decode(codec)" is a
*bytes* method.

This is not harder to learn for people (for BASE64 encoding or for
coded character sets), because in each case there's a natural sense of
direction for *en*coding vs. *de*coding.  But it does break duck-
typing, as does the web developer bytes-to-bytes usage of BASE64.

What I'm groping toward is an idea of a "variable method", so that we
could use .encode and .decode where they are TOOWTDI for people even
though a purely formal interpretation of duck-typing would say "but
why is that blue whale quacking, waddling, and flying?"  In other
words (although I have no idea how best to implement it), I would like
"somestring.encode('base64')" to fail with "I don't know how to do
that" (an attribute lookup error?), the same way that
"somebytes.encode('utf-8')" does in Python 3 today.



More information about the Python-Dev mailing list