[Python-Dev] Why can't I encode/decode base64 without importing a module?

Lennart Regebro regebro at gmail.com
Thu Apr 25 15:34:45 CEST 2013


On Thu, Apr 25, 2013 at 2:57 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> I can think of many usecases where I want to *embed* base64-encoded
> data in a larger text *before* encoding that text and transmitting
> it over a 8-bit channel.

That still doesn't mean that this should be the default behavior. Just
because you *can* represent base64 as Unicode text doesn't mean that
it should be.

> (GPG signatures, binary data embedded in JSON objects, etc.)

Is the GPG signature calculated on the *Unicode* data? How is that
done? Isn't it done on the encoded message? As I understand it a GPG
signature is done on any sort of document. Either me or you have
completely misunderstood how GPG works, I think. :-)

In the case of JSON objects, they are intended for data exchange, and
hence in the end need to be byte strings. So if you have a byte string
you want to base64 encode before transmitting it with json, you would
just end up transforming it to a unicode string and then back. That
doesn't seem useful.

One use case where you clearly *do* want the base64 encoded data to be
unicode strings is because you want to embed it in a text discussing
base64 strings, for a blog or a book or something. That doesn't seem
to be a very common usecase.

For the most part you base64 encode things because it's going to be
transmitted, and hence the natural result of a base64 encoding should
be data that is ready to be transmitted, hence byte strings, and not
Unicode strings.

> Python 3 doesn't *view* text as unicode, it *represents* it as unicode.

I don't agree that there is a significant difference between those
wordings in this context. The end result is the same: Things intended
to be handled/seen as textual should be unicode strings, things
intended for data exchange should be byte strings. Something that is
base64 encoded is primarily intended for data exchange. A base64
encoding should therefore return byte strings, especially since most
API's that perform this transmission will take byte strings as input.
If you want to include this in textual data, for whatever reason, like
printing it in a book, then the conversion is trivial, but that is
clearly the less common use case, and should therefore not be the
default behavior.

//Lennart


More information about the Python-Dev mailing list