[Python-Dev] bytes.from_hex()

Stephen J. Turnbull stephen at xemacs.org
Mon Feb 27 06:59:44 CET 2006

>>>>> "Greg" == Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

    Greg> Stephen J. Turnbull wrote:

    >> I gave you one, MIME processing in email

    Greg> If implementing a mime packer is really the only use case
    Greg> for base64, then it might as well be removed from the
    Greg> standard library, since 99.99999% of all programmers will
    Greg> never touch it.  I don't have any real-life use cases for
    Greg> base64 that a non-mime-implementer might come across, so all
    Greg> I can do is imagine what shape such a use case might have.

I guess we don't have much to talk about, then.

    >> Give me a use case where it matters practically that the output
    >> of the base64 codec be Python unicode characters rather than
    >> 8-bit ASCII characters.

    Greg> I'd be perfectly happy with ascii characters, but in Py3k,
    Greg> the most natural place to keep ascii characters will be in
    Greg> character strings, not byte arrays.

Natural != practical.

Anyway, I disagree, and I've lived with the problems that come with an
environment that mixes objects with various underlying semantics into
a single "text stream" for a decade and a half.

That doesn't make me authoritative, but as we agree to disagree, I
hope you'll keep in mind that someone with real-world experience that
is somewhat relevant[1] to the issue doesn't find that natural at all.

    Greg> Since the Unicode character set is a superset of the ASCII
    Greg> character set, it doesn't seem unreasonable that they could
    Greg> also be thought of as Unicode characters.

I agree.  However, as soon as I go past that intuition to thinking
about what that implies for _operations_ on the base64 string, it
begins to seem unreasonable, unnatural, and downright dangerous.  The
base64 string is a representation of an object that doesn't have text
semantics.  Nor do base64 strings have text semantics: they can't even
be concatenated as text (the pad character '=' is typically a syntax
error in a profile of base64, except as terminal padding).  So if you
wish to concatenate the underlying objects, the base64 strings must be
decoded, concatenated, and re-encoded in the general case.  IMO it's
not worth preserving the very superficial coincidence of "character
representation" in the face of such semantics.

I think that fact that favoring the coincidence of representation
leads you to also deprecate the very natural use of the codec API to
implement and understand base64 is indicative of a deep problem with
the idea of implementing base64 as bytes->unicode.

[1]  That "somewhat" is intended literally; my specialty is working
with codecs for humans in Emacs, but I've also worked with more
abstract codecs such as base64 in contexts like email, in both LISP
and Python.

School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

More information about the Python-Dev mailing list