[Python-Dev] Why can't I encode/decode base64 without importing a module?

Stephen J. Turnbull stephen at xemacs.org
Thu Apr 25 19:31:25 CEST 2013


Lennart Regebro writes:
 > On Thu, Apr 25, 2013 at 4:22 PM, MRAB <python at mrabarnett.plus.com> wrote:
 > > The JSON specification says that it's text. Its string literals can
 > > contain Unicode codepoints. It needs to be encoded to bytes for
 > > transmission and storage, but JSON itself is not a bytestring format.
 > 
 > OK, fair enough.
 > 
 > > base64 is a way of encoding binary data as text.
 > 
 > It's a way of encoding binary data using ASCII. There is a subtle but
 > important difference.

Yes, there is a difference, but I think you're wrong.  RFC 4648
explicitly states that Base-n encodings are intended for "human
handling" and even makes reference to character glyphs (the rationale
for excluding confusable digits from the Base32 alphabet).  That's
text.  Even if it is a rather restricted subset of text, those
restrictions are much stronger than merely to ASCII, and they are
based on aspects of text that go well beyond merely an encoding with a
small code unit.

 > > In Python 3 we're trying to stop mixing binary data (bytestrings) with
 > > text (Unicode strings).
 > 
 > Yup. And that's why a byte64 encoding shouldn't return Unicode strings.

That's inaccurate.  Antoine has presented several examples of why
*some* base64 encoders might return Unicode strings, precisely because
their output will be embedded in Unicode streams.  Debugging the MIME
composition functions in the email module is another.

An accurate statement is that these use cases are relatively unusual.
The common use case is feeding a binary stream directly into a wire
protocol.  Supporting that use case demands a base64 encoder with a
bytes-to-bytes signature in the stdlib, for both convenience and to
some extent efficiency.

I don't really care if the stdlib supports the specialized use cases
with a separate base64 encoder (Antoine suggested the binascii
module), or if it leaves that up to the user (it's just an occasional
use of ".decode('ascii')", after all).


More information about the Python-Dev mailing list