[Python-Dev] Why does base64 return bytes?
R. David Murray
rdmurray at bitdance.com
Thu Jun 16 07:08:59 EDT 2016
On Wed, 15 Jun 2016 11:51:05 +1200, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> R. David Murray wrote:
> > The fundamental purpose of the base64 encoding is to take a series
> > of arbitrary bytes and reversibly turn them into another series of
> > bytes in which the eighth bit is not significant.
>
> No, it's not. If that were its only purpose, it would be
> called base128, and the RFC would describe it purely in
> terms of bit patterns and not mention characters or
> character sets at all.
Sorry, you are correct. IMO it is to encode it to a representation
that consists of a limited subset of printable (makes marks on paper or
screen) characters (which is an imprecise term); ie: data that will not
be interpreted as having control information by most programs processing
the data stream as either human-readable or raw bytes.
The rest of the argument still applies, specifically the part about
wire encoding to seven bit bytes being the currently-most-used[*] and
backward-compatible use case. And I say this despite the fact that the
email package currently handles everything as surrogate-escaped text
and so does in fact decode the output of base64.encode to ASCII and
only later re-encodes it. That's a design issue in the email package
deriving from the fact that bytes and string used to be the same thing
in python2. It might some day get corrected, but probably won't be, and
it is a legacy of *not* making the distinction between bytes and string.
--David
[*] Yes this is changing, I already said that :)
More information about the Python-Dev
mailing list