[Python-Dev] bytes.from_hex()
Stephen J. Turnbull
stephen at xemacs.org
Mon Feb 27 06:59:44 CET 2006
>>>>> "Greg" == Greg Ewing <greg.ewing at canterbury.ac.nz> writes:
Greg> Stephen J. Turnbull wrote:
>> I gave you one, MIME processing in email
Greg> If implementing a mime packer is really the only use case
Greg> for base64, then it might as well be removed from the
Greg> standard library, since 99.99999% of all programmers will
Greg> never touch it. I don't have any real-life use cases for
Greg> base64 that a non-mime-implementer might come across, so all
Greg> I can do is imagine what shape such a use case might have.
I guess we don't have much to talk about, then.
>> Give me a use case where it matters practically that the output
>> of the base64 codec be Python unicode characters rather than
>> 8-bit ASCII characters.
Greg> I'd be perfectly happy with ascii characters, but in Py3k,
Greg> the most natural place to keep ascii characters will be in
Greg> character strings, not byte arrays.
Natural != practical.
Anyway, I disagree, and I've lived with the problems that come with an
environment that mixes objects with various underlying semantics into
a single "text stream" for a decade and a half.
That doesn't make me authoritative, but as we agree to disagree, I
hope you'll keep in mind that someone with real-world experience that
is somewhat relevant[1] to the issue doesn't find that natural at all.
Greg> Since the Unicode character set is a superset of the ASCII
Greg> character set, it doesn't seem unreasonable that they could
Greg> also be thought of as Unicode characters.
I agree. However, as soon as I go past that intuition to thinking
about what that implies for _operations_ on the base64 string, it
begins to seem unreasonable, unnatural, and downright dangerous. The
base64 string is a representation of an object that doesn't have text
semantics. Nor do base64 strings have text semantics: they can't even
be concatenated as text (the pad character '=' is typically a syntax
error in a profile of base64, except as terminal padding). So if you
wish to concatenate the underlying objects, the base64 strings must be
decoded, concatenated, and re-encoded in the general case. IMO it's
not worth preserving the very superficial coincidence of "character
representation" in the face of such semantics.
I think that fact that favoring the coincidence of representation
leads you to also deprecate the very natural use of the codec API to
implement and understand base64 is indicative of a deep problem with
the idea of implementing base64 as bytes->unicode.
Footnotes:
[1] That "somewhat" is intended literally; my specialty is working
with codecs for humans in Emacs, but I've also worked with more
abstract codecs such as base64 in contexts like email, in both LISP
and Python.
--
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
More information about the Python-Dev
mailing list