[Python-Dev] Why does base64 return bytes?
greg.ewing at canterbury.ac.nz
Wed Jun 15 03:02:57 EDT 2016
Stephen J. Turnbull wrote:
> The RFC is unclear on this point, but I read it as specifying the
> ASCII coded character set, not the ASCII repertoire of (abstract)
Well, I think you've misread it. Or at least there is a
more general reading possible that is entirely consistent
with the stated purpose and doesn't assume any particular
> It's more subtle than that. *RFCs do not deal with text.*
That may be true of most RFCs, but I think this particular
one really *is* talking about text, even if the authors
didn't realise it at the time.
> It is also desirable that it be likely to pass unscathed through channels
> that ... *inadvertantly* treat it as text. Both requirements are
> conveniently fulfilled by using appropriate ASCII subsets, and encoding on
> the wire using the usual bit patterns.
But only if the part that is (deliberately or inadvertently)
treating it as text is using ASCII as its encoding. So, by
your reading of the RFC, base64 is *only* intended for
channels that use ASCII encoding.
Whereas if you drop the assumption of ASCII and use whatever
encoding the channel uses for text, then it works for all
RFC 4648 doesn't mention it, but an earlier RFC on base64
explicitly said that characters were chosen that also exist
in EBCDIC, so it seems they were intending that base64
should work on EBCDIC-bases systems as well as ASCII-based
> It's purely a matter of our convenience
> (as programmer *in* Python) whether we return str or bytes.
Yes, and it seems to me the decision has been made by people
with their noses stuck in low-level protocol implementations.
Whenever *I've* needed to base64 encode something, I've wanted
the output as text, because that's what I needed to feed into
the next stage of the process.
Maybe there should be two versions of the base64 codec, one
producing bytes and one producing text?
More information about the Python-Dev