[Python-Dev] Why does base64 return bytes?

Tue Jun 14 14:17:16 EDT 2016

Hello,

On Tue, 14 Jun 2016 14:02:02 -0400
Random832 <random832 at fastmail.com> wrote:

> On Tue, Jun 14, 2016, at 13:19, Paul Sokolovsky wrote:
> > Well, it's easy to remember the conclusion - it was decided to
> > return bytes. The reason also wouldn't be hard to imagine -
> > regardless of the fact that base64 uses ASCII codes for digits and
> > letters, it's still essentially a binary data. 
> 
> Only in the sense that all text is binary data. There's nothing in the
> definition of base64 specifying ASCII codes. It specifies *characters*
> that all happen to be in ASCII's character repertoire.
> 
> >And the most natural step for it is to send
> > it down the socket (socket.send() accepts bytes), etc.
> 
> How is that more natural than to send it to a text buffer that is

It's more natural because it's more efficient. It's more natural in the
same sense that the most natural way to get from point A to point B is
a straight line.

> ultimately encoded (maybe not even in an ASCII-compatible encoding...
> though probably) and sent down a socket or written to a file by a
> layer that is outside your control? Yes, everything eventually ends
> up as bytes. That doesn't mean that we should obsessively convert
> things to bytes as early as possible.

It's vice-versa - there's no need to obsessively convert simple,
primary type of bytes (everything in computers are bytes) to more
complex things like Unicode strings.

> I mean if we were gonna do that why bother even having a unicode
> string type at all?

You're trying to raise the topic which is a subject of gigantic flame
wars on python-list for years. Here's my summary: not using unicode
string type *at all* is better than not using bytes type at all. So,
feel free to use unicode string *only* when it's needed, which is
*only* when you accept input from or produce output for *human* (like
real human, walking down a street to do grocery shopping). In all
other cases, data should stay bytes (mind - stay, as it's bytes in the
beginning, and it requires extra effort to convert it to a strings).

> > I'd find it a bit more surprising that binascii.hexlify() returns
> > bytes, but I personally got used to it, and consider it a
> > consistency thing on binascii module.
> > 
> > Generally, with Python3 by default using (inefficient) Unicode for
> > strings, 
> 
> Why is it inefficient?

Because bytes is the most efficient basic representation of data.
Everything which tries to convert it to something is less efficient in
general. Less efficient == inefficient.

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com