[Python-Dev] Why can't I encode/decode base64 without importing a module?

Thu Apr 25 18:53:53 CEST 2013

On 25/04/2013 15:22, MRAB wrote:
> On 25/04/2013 14:34, Lennart Regebro wrote:
>> On Thu, Apr 25, 2013 at 2:57 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>> I can think of many usecases where I want to *embed* base64-encoded
>>> data in a larger text *before* encoding that text and transmitting
>>> it over a 8-bit channel.
>>
>> That still doesn't mean that this should be the default behavior. Just
>> because you *can* represent base64 as Unicode text doesn't mean that
>> it should be.
>>
[snip]
>> One use case where you clearly *do* want the base64 encoded data to be
>> unicode strings is because you want to embed it in a text discussing
>> base64 strings, for a blog or a book or something. That doesn't seem
>> to be a very common usecase.
>>
>> For the most part you base64 encode things because it's going to be
>> transmitted, and hence the natural result of a base64 encoding should
>> be data that is ready to be transmitted, hence byte strings, and not
>> Unicode strings.
>>
>>> Python 3 doesn't *view* text as unicode, it *represents* it as unicode.
>>
>> I don't agree that there is a significant difference between those
>> wordings in this context. The end result is the same: Things intended
>> to be handled/seen as textual should be unicode strings, things
>> intended for data exchange should be byte strings. Something that is
>> base64 encoded is primarily intended for data exchange. A base64
>> encoding should therefore return byte strings, especially since most
>> API's that perform this transmission will take byte strings as input.
>> If you want to include this in textual data, for whatever reason, like
>> printing it in a book, then the conversion is trivial, but that is
>> clearly the less common use case, and should therefore not be the
>> default behavior.
>>
> base64 is a way of encoding binary data as text. The problem is that
> traditionally text has been encoded with one byte per character, except
> in those locales where there were too many characters in the character
> set for that to be possible.
>
> In Python 3 we're trying to stop mixing binary data (bytestrings) with
> text (Unicode strings).
>
RFC 4648 says """Base encoding of data is used in many situations to 
store or transfer data in environments that, perhaps for legacy reasons, 
are restricted to US-ASCII [1] data.""".

To me, "US-ASCII" is an encoding, so it appears to be talking about
encoding binary data (bytestrings) to ASCII-encoded text (bytestrings).