[Python-Dev] Why does base64 return bytes?

Wed Jun 15 02:22:28 EDT 2016

On Tue, Jun 14, 2016 at 8:42 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Thank you for finding that.  I reread it and still believe that bytes was
> the right choice.  Base64 is an generic edge encoding for binary data.  It
> fits in with the the standard paradigm as a edge encoding.

I'd like to me-too Terry's sentiment, but also expand on it a bit.

Base64 encoding is used to convert bytes into a limited set of symbols
for inclusion in a stream of data. Whether bytes or unicode characters
are appropriate depends on whether the stream being constructed is a
byte stream or a unicode character stream.

Many people do deal with byte streams in Python and we have large
sub-communities for who this use case is important (e.g. Twisted,
Asyncio, anyone using the socket module).

It is also no longer 1980 though, and there are many protocols layered
on top of unicode character streams rather than bytes.

Ideally I'd like us to support both options (like we've been
increasingly doing for reading from other external sources such as
file systems or environment variables).

If we only support one, I would prefer it to be bytes since (bytes ->
bytes -> unicode) seems like less overhead and slightly conceptually
clearer than (bytes -> unicode -> bytes), but I consider this a
personal preference rather than any sort of one-true-way.

Schiavo
Simon