[Python-Dev] bytes.from_hex()

"Martin v. Löwis" martin at v.loewis.de
Sun Feb 19 20:14:25 CET 2006


Stephen J. Turnbull wrote:
>     Bengt> The characters in b could be encoded in plain ascii, or
>     Bengt> utf16le, you have to know.
> 
> Which base64 are you thinking about?  Both RFC 3548 and RFC 2045
> (MIME) specify subsets of US-ASCII explicitly.

Unfortunately, it is ambiguous as to whether they refer to US-ASCII,
the character set, or US-ASCII, the encoding. It appears that
RFC 3548 talks about the character set only:

- section 2.4 talks about "choosing an alphabet", and how it should
  be possible for humans to handle such data.
- section 2.3 talks about non-alphabet characters

So it appears that RFC 3548 defines a conversion bytes->text.
To transmit this, you then also need encoding. MIME appears
to also use the US-ASCII *encoding* ("charset", in IETF speak),
for the "base64" Content-Transfer-Encoding.

For an example where base64 is *not* necessarily ASCII-encoded,
see the "binary" data type in XML Schema. There, base64 is embedded
into an XML document, and uses the encoding of the entire XML
document. As a result, you may get base64 data in utf16le.

Regards,
Martin


More information about the Python-Dev mailing list