[Python-Dev] bytes.from_hex()

Sun Feb 26 00:18:54 CET 2006

Stephen J. Turnbull wrote:

> The reason that Python source code is text is that the primary
> producers/consumers of Python source code are human beings, not
> compilers

I disagree with "primary" -- I think human and computer
use of source code have equal importance. Because of the
fact that Python source code must be acceptable to the
Python compiler, a great many transformations that would
be harmless to English text (upper casing, paragraph
wrapping, etc.) would cause disaster if applied to a
Python program. I don't see how base64 is any different.

> Yes, which implies that you assume he has control of the data all the
> way to the channel that actually requires base64.

Yes. If he doesn't, he can't safely use base64 at all.
That's true regardless of how the base64-encoded data
is represented. It's true of any data of any kind.

> Use case: the Gnus MUA supports the RFC that allows non-ASCII names in
> MIME headers that take file names...

I'm not familiar with all the details you're alluding
to here, but if there's a bug here, I'd say it's due
to somebody not thinking something through properly.
It shouldn't matter if something gets encoded four
times as long as it gets decoded four times at the
other end. If it's not possible to do that, someone
made an assumption about the channel that wasn't
true.

> It's "what is the Python compiler/interpreter going
 > to think?"  AFAICS, it's going to think that base64 is
 > a unicode codec.

Only if it's designed that way, and I specifically
think it shouldn't -- i.e. it should be an error
to attempt the likes of a_unicode_string.encode("base64")
or unicode(something, "base64"). The interface for
doing base64 encoding should be something else.

> I don't believe that "takes a character string as
> input" has any intrinsic meaning.

I'm using that phrase in the context of Python, where
it means "a function that takes a Python character
string as input".

In the particular case of base64, it has the added
restriction that it must preserve the particular
65 characters used.

 > In practice, I think it's a loaded gun
> aimed at my foot.  And yours.

Whereas it seems quite the opposite to me, i.e.
*failing* to clearly distinguish between text and
binary data here is what will lead to confusion and
foot-shooting.

I think we need some concrete use cases to talk
about if we're to get any further with this. Do
you have any such use cases in mind?

Greg