[Python-Dev] Why can't I encode/decode base64 without importing a module?

Isaac Morland ijmorlan at uwaterloo.ca
Thu Apr 25 19:29:32 CEST 2013


On Thu, 25 Apr 2013, Lennart Regebro wrote:

> On Thu, Apr 25, 2013 at 4:22 PM, MRAB <python at mrabarnett.plus.com> wrote:
>> The JSON specification says that it's text. Its string literals can
>> contain Unicode codepoints. It needs to be encoded to bytes for
>> transmission and storage, but JSON itself is not a bytestring format.
>
> OK, fair enough.
>
>> base64 is a way of encoding binary data as text.
>
> It's a way of encoding binary data using ASCII. There is a subtle but
> important difference.

It is a way of encoding arrays of 8-bit bytes as arrays of characters that 
are part of the printable, non-whitespace subset of the ASCII repertoire. 
Since the ASCII repertoire is now simply the first 128 code points in the 
Unicode repertoire, it is equally correct to say that base64 is a way of 
encoding binary data as Unicode text.

>> In Python 3 we're trying to stop mixing binary data (bytestrings) with
>> text (Unicode strings).
>
> Yup. And that's why a byte64 encoding shouldn't return Unicode strings.

That is exactly why it should return Unicode strings.  What bytes should 
get sent if base64 is used to send a byte array over an EBCDIC link? [*]

Having said that, there may be other reasons for base64 encoding to return 
bytes - I can conceive of arguments involving efficiency, or practicality, 
or the most common use cases.  So I can't say for sure what base64 
encoding actually ought to return in Python.  But the purist stance should 
be that base64 encoding should return text, i.e. a string, i.e. unicode.

[*] I apologize to anybody who just ate.

Isaac Morland			CSCF Web Guru
DC 2554C, x36650		WWW Software Specialist


More information about the Python-Dev mailing list