[Python-Dev] Why can't I encode/decode base64 without importing a module?

Stephen J. Turnbull stephen at xemacs.org
Tue Apr 23 15:29:33 CEST 2013


R. David Murray writes:

 > You transform *into* the encoding, and untransform *out* of the
 > encoding.  Do you have an example where that would be ambiguous?

In the bytes-to-bytes case, any pair of character encodings (eg, UTF-8
and ISO-8859-15) would do.  Or how about in text, ReST to HTML?

BASE64 itself is ambiguous.  By RFC specification, BASE64 is a
*textual* representation of arbitrary binary data.  (Cf. URIs.)  The
natural interpretation of .encode('base64') in that context would be
as a bytes-to-text encoder.  However, this has several problems.  In
practice, we invariably use an ASCII octet stream to carry BASE64-
encoded data.  So web developers would almost certainly expect a
bytes-to-bytes encoder.  Such a bytes-to-bytes encoder can't be
duck-typed.  Double-encoding bugs wouldn't be detected until the
stream arrives at the user.  And the RFC-based signature of
.encode('base64') as bytes-to-text is precisely opposite to that of
.encode('utf-8') (text-to-bytes).

It is certainly true that there are many unambiguous cases.  In the
case of a true text processing facility (eg, Emacs buffers or Python 3
str) where there is an unambiguous text type with a constant and
opaque internal representation, it makes a lot of sense to treat the
text type as special/central, and use the terminology "encode [from
text]" and "decode [to text]".  It's easy to remember, which one is
special is obvious, and the difference in input and output types means
that mistaken use of the API will be detected by duck-typing.

However, in the case of bytes-bytes or text-text transformations, it's
not the presence of unambiguous cases that should drive API design
IMO.  It's the presence of the ambiguous cases that we should cater
to.  I don't see easy solutions to this issue.

Steve


More information about the Python-Dev mailing list