[Python-Dev] bytes.from_hex()

Mon Feb 27 01:47:25 CET 2006

Stephen J. Turnbull wrote:

 > I gave you one, MIME processing in email

If implementing a mime packer is really the only use case
for base64, then it might as well be removed from the
standard library, since 99.99999% of all programmers will
never touch it. Those that do will need to have boned up
on the subject of encoding until it's coming out their
ears, so they'll know what they're doing in any case. And
they'll be quite competent to write their own base64
encoder that works however they want it to.

I don't have any real-life use cases for base64 that a
non-mime-implementer might come across, so all I can do
is imagine what shape such a use case might have.

When I do that, I come up with what I've already described.
The programmer wants to send arbitrary data over a channel
that only accepts text. He doesn't know, and doesn't want
to have to know, how the channel encodes that text --
it might be ASCII or EBCDIC or morse code, it shouldn't
matter. If his Python base64 encoder produces a Python
character string, and his Python channel interface accepts
a Python character string, he doesn't have to know.

> I think it's your turn.  Give me a use case where it matters
> practically that the output of the base64 codec be Python unicode
> characters rather than 8-bit ASCII characters.

I'd be perfectly happy with ascii characters, but in Py3k,
the most natural place to keep ascii characters will be in
character strings, not byte arrays.

 > Everything you have written so far is based on
> defending your maintained assumption that because Python implements
> text processing via the unicode type, everything that is described as
> a "character" must be coerced to that type.

I'm not just blindly assuming that because the RFC happens
to use the word "character". I'm also looking at how it uses
that word in an effort to understand what it means. It
*doesn't* specify what bit patterns are to be used to
represent the characters. It *does* mention two "character
sets", namely ASCII and EBCDIC, with the implication that
the characters it is talking about could be taken as being
members of either of those sets. Since the Unicode character
set is a superset of the ASCII character set, it doesn't
seem unreasonable that they could also be thought of as
Unicode characters.

> I don't really see a downside, except for the occasional double
> conversion ASCII -> unicode -> UTF-16, as is allowed (but not
> mandated) in XML's use of base64.  What downside do you see?

It appears that all your upsides I see as downsides, and
vice versa. We appear to be mutually upside-down. :-)

XML is another example. Inside a Python program, the most
natural way to represent an XML is as a character string.
Your way, embedding base64 in it would require converting
the bytes produced by the base64 encoder into a character
string in some way, taking into account the assumed ascii
encoding of said bytes. My way, you just use the result
directly, with no coding involved at all.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+