[Python-Dev] Why can't I encode/decode base64 without importing a module?

Stephen J. Turnbull stephen at xemacs.org
Thu Apr 25 20:44:58 CEST 2013


MRAB writes:

 > RFC 4648 says """Base encoding of data is used in many situations to 
 > store or transfer data in environments that, perhaps for legacy reasons, 
 > are restricted to US-ASCII [1] data.""".
 > 
 > To me, "US-ASCII" is an encoding, so it appears to be talking about
 > encoding binary data (bytestrings) to ASCII-encoded text (bytestrings).

I think that's a misreading, inconsistent with the rest of the RFC.

The references to US-ASCII are not clearly normative, as the value-
character mappings are given in tables, and are self-contained.  (The
one you quote is clearly informative, since it describes a use-case.)
The term "subset of US-ASCII" suggests repertoire, not encoding, as
does the use of "alphabet" to refer to these subsets.

*Every* (other?) normative statement is very careful to say that input
of a Base-n encoder is "octets" (with two uses of "bytes" in the
definition of Base32), and the output is "characters".  There are no
exceptions, and there are *no* references to encoding of characters or
the corresponding character codes (except the possible implicit
reference via "US-ASCII").

I can make no sense of those facts if the intent of the RFC is to
restrict the output of a Base-n encoder to characters encoded in
(8-bit) US-ASCII.  Why not just say so, and use "octets" and their
ASCII codes throughout, with the corresponding characters used as
informative commentary?  I think it much more likely that "subset of
the character repertoire of US-ASCII" was intended, but abbreviated to
"subset of US-ASCII".  This kind of abbreviation is very common in
informal discussion of coded character sets.

I admit it's a little surprising that the author would be so
incautious in his use of "US-ASCII", but if he really meant US-ASCII-
the-encoding, I find the style of the rest of the RFC astonishing!


More information about the Python-Dev mailing list