"Martin v. Löwis"
martin at v.loewis.de
Sun Feb 19 20:14:25 CET 2006
Stephen J. Turnbull wrote:
> Bengt> The characters in b could be encoded in plain ascii, or
> Bengt> utf16le, you have to know.
> Which base64 are you thinking about? Both RFC 3548 and RFC 2045
> (MIME) specify subsets of US-ASCII explicitly.
Unfortunately, it is ambiguous as to whether they refer to US-ASCII,
the character set, or US-ASCII, the encoding. It appears that
RFC 3548 talks about the character set only:
- section 2.4 talks about "choosing an alphabet", and how it should
be possible for humans to handle such data.
- section 2.3 talks about non-alphabet characters
So it appears that RFC 3548 defines a conversion bytes->text.
To transmit this, you then also need encoding. MIME appears
to also use the US-ASCII *encoding* ("charset", in IETF speak),
for the "base64" Content-Transfer-Encoding.
For an example where base64 is *not* necessarily ASCII-encoded,
see the "binary" data type in XML Schema. There, base64 is embedded
into an XML document, and uses the encoding of the entire XML
document. As a result, you may get base64 data in utf16le.
More information about the Python-Dev