[Python-Dev] Why can't I encode/decode base64 without importing a module?
Stephen J. Turnbull
stephen at xemacs.org
Thu Apr 25 20:44:58 CEST 2013
> RFC 4648 says """Base encoding of data is used in many situations to
> store or transfer data in environments that, perhaps for legacy reasons,
> are restricted to US-ASCII  data.""".
> To me, "US-ASCII" is an encoding, so it appears to be talking about
> encoding binary data (bytestrings) to ASCII-encoded text (bytestrings).
I think that's a misreading, inconsistent with the rest of the RFC.
The references to US-ASCII are not clearly normative, as the value-
character mappings are given in tables, and are self-contained. (The
one you quote is clearly informative, since it describes a use-case.)
The term "subset of US-ASCII" suggests repertoire, not encoding, as
does the use of "alphabet" to refer to these subsets.
*Every* (other?) normative statement is very careful to say that input
of a Base-n encoder is "octets" (with two uses of "bytes" in the
definition of Base32), and the output is "characters". There are no
exceptions, and there are *no* references to encoding of characters or
the corresponding character codes (except the possible implicit
reference via "US-ASCII").
I can make no sense of those facts if the intent of the RFC is to
restrict the output of a Base-n encoder to characters encoded in
(8-bit) US-ASCII. Why not just say so, and use "octets" and their
ASCII codes throughout, with the corresponding characters used as
informative commentary? I think it much more likely that "subset of
the character repertoire of US-ASCII" was intended, but abbreviated to
"subset of US-ASCII". This kind of abbreviation is very common in
informal discussion of coded character sets.
I admit it's a little surprising that the author would be so
incautious in his use of "US-ASCII", but if he really meant US-ASCII-
the-encoding, I find the style of the rest of the RFC astonishing!
More information about the Python-Dev