Steven D'Aprano steve at
Sun Jun 12 22:22:40 EDT 2016

On Mon, 13 Jun 2016 04:56 am, Marcin Rak wrote:

> Hi to everyone.
> Let's say I have some binary data, be it whatever, in the 'data' variable.
>  After calling the following line
> b64_encoded_data = base64.b64encode(data)
> my b64_encoded_data variables holds, would you believe it, a string as
> bytes!.

That's because base64 is a bytes-to-bytes transformation. It has nothing to
do with unicode encodings.

> That is, the b64_encoded_data variable is of type 'bytes' and when you
> peek inside it's a string (made up of what seems to be only characters
> that exist in Base 64).  

If you print or otherwise display bytes, for the convenience of human
beings, those bytes are displayed as if they were ASCII. E.g. the byte 0x61
is displayed as 'a'. Good idea? Bad idea? I can see arguments either way,
but that's how it is.

Naturally after base64 encoding some bytes, you will be left with only bytes
in base64. That's the whole point of it.

> Why isn't it a string yet?

*shrug* For backwards compatibility, probably, or for historical reasons, or
because the person who write the base64 module thought that this was the
most useful behaviour.

I can promise you that had he chosen the opposite behaviour, that it returns
a str instead of bytes, there would be people complaining "why do I have to
use encode('ascii') to get bytes?".

> In fact, I now on  
> that variable have to apply the decode('utf-8') method to get a string
> object holding the exact same sequence of characters as was held by
> b64_encoded_data bytes variable.

You could also use decode('ascii'), which is probably more "correct", as the
base64 data shouldn't include anything which isn't ASCII.


More information about the Python-list mailing list