[Python-Dev] Why does base64 return bytes?

Wed Jun 15 06:21:25 EDT 2016

On Wed, 15 Jun 2016, Greg Ewing wrote:

> Simon Cross wrote:
>> If we only support one, I would prefer it to be bytes since (bytes ->
>> bytes -> unicode) seems like less overhead and slightly conceptually
>> clearer than (bytes -> unicode -> bytes),
>
> Whereas bytes -> unicode, followed if needed by unicode -> bytes,
> seems conceptually clearer to me. IOW, base64 is conceptually a
> bytes-to-text transformation, and the usual way to represent
> text in Python 3 is unicode.

And in CPython, do I understand correctly that the output text would be 
represented using one byte per character?  If so, would there be a way of 
encoding that into UTF-8 that re-used the raw memory that backs the 
Unicode object?  And, therefore, avoids almost all the inefficiency of 
going via Unicode?  If so, this would be a win - proper use of Unicode to 
represent a text string, combined with instantaneous conversion into a 
bytes object for the purpose of writing to the OS.

Isaac Morland           CSCF Web Guru
DC 2619, x36650         WWW Software Specialist