[Python-Dev] Why can't I encode/decode base64 without importing a module?
Stephen J. Turnbull
stephen at xemacs.org
Tue Apr 23 15:29:33 CEST 2013
R. David Murray writes:
> You transform *into* the encoding, and untransform *out* of the
> encoding. Do you have an example where that would be ambiguous?
In the bytes-to-bytes case, any pair of character encodings (eg, UTF-8
and ISO-8859-15) would do. Or how about in text, ReST to HTML?
BASE64 itself is ambiguous. By RFC specification, BASE64 is a
*textual* representation of arbitrary binary data. (Cf. URIs.) The
natural interpretation of .encode('base64') in that context would be
as a bytes-to-text encoder. However, this has several problems. In
practice, we invariably use an ASCII octet stream to carry BASE64-
encoded data. So web developers would almost certainly expect a
bytes-to-bytes encoder. Such a bytes-to-bytes encoder can't be
duck-typed. Double-encoding bugs wouldn't be detected until the
stream arrives at the user. And the RFC-based signature of
.encode('base64') as bytes-to-text is precisely opposite to that of
It is certainly true that there are many unambiguous cases. In the
case of a true text processing facility (eg, Emacs buffers or Python 3
str) where there is an unambiguous text type with a constant and
opaque internal representation, it makes a lot of sense to treat the
text type as special/central, and use the terminology "encode [from
text]" and "decode [to text]". It's easy to remember, which one is
special is obvious, and the difference in input and output types means
that mistaken use of the API will be detected by duck-typing.
However, in the case of bytes-bytes or text-text transformations, it's
not the presence of unambiguous cases that should drive API design
IMO. It's the presence of the ambiguous cases that we should cater
to. I don't see easy solutions to this issue.
More information about the Python-Dev