[Python-Dev] bytes.from_hex()
Stephen J. Turnbull
stephen at xemacs.org
Sat Feb 25 19:05:38 CET 2006
>>>>> "Greg" == Greg Ewing <greg.ewing at canterbury.ac.nz> writes:
Greg> Stephen J. Turnbull wrote:
>> the kind of "text" for which Unicode was designed is normally
>> produced and consumed by people, who wll pt up w/ ll knds f
>> nnsns. Base64 decoders will not put up with the same kinds of
>> nonsense that people will.
Greg> The Python compiler won't put up with that sort of nonsense
Greg> either. Would you consider that makes Python source code
Greg> binary data rather than text, and that it's inappropriate to
Greg> represent it using a unicode string?
The reason that Python source code is text is that the primary
producers/consumers of Python source code are human beings, not
compilers.
There are no such human producers/consumers of base64. Unless you
prefer that I expressed that last sentence as "VGhlIHJlYXNvbiB0aG
F0IFB5dGhvbiBzb3VyY2UgY29kZSBpcyB0ZXh0IGlzIGJlY2F1c2UgdGhlIHByaW1
hcnkKcHJvZHVjZXJzL2NvbnN1bWVycyBvZiBQeXRob24gc291cmNlIGNvZGUgYXJl
IGh1bWFuIGJlaW5ncywgbm90CmNvbXBpbGVycy4="?
>> You're basically assuming that the person who implements the
>> code that processes a Unicode string is the same person who
>> implemented the code that converts a binary object into base64
>> and inserts it into a string.
Greg> No, I'm assuming the user of base64 knows the
Greg> characteristics of the channel he's using.
Yes, which implies that you assume he has control of the data all the
way to the channel that actually requires base64.
Use case: the Gnus MUA supports the RFC that allows non-ASCII names in
MIME headers that take file names. The interface was written for
message-at-a-time use, which makes sense for composition. Somebody
else added "save and strip part" editing capability, but this only
works one MIME part at a time. So if you have a message with four
MIME parts and you save and strip all of them, the first one gets
encoded four times.
The reason for *this* bug, and scores like it over the years, is that
somebody made it convenient to put wire protocols into a text
document. Shouldn't Python do better than that? Shouldn't Python
text be for humans, rather than be whatever had the tag "character"
attached to it for convenience of definition of a protocol for
communication of data humans can't process without mechanical
assistance?
>> I don't think it's a good idea to gratuitously introduce wire
>> protocols as unicode codecs,
Greg> I am *not* saying that base64 is a unicode codec! If that's
Greg> what you thought I was saying, it's no wonder we're
Greg> confusing each other.
I know you don't think that it's a duck, but it waddles and quacks.
Ie, the question is not what I think you're saying. It's "what is the
Python compiler/interpreter going to think?" AFAICS, it's going to
think that base64 is a unicode codec.
Greg> The only time I need to use something like base64 is when I
Greg> have something that will only accept text. In Py3k, "accepts
Greg> text" is going to mean "takes a character string as input",
Characters are inherently abstract, as a class they can't be
instantiated as input or output---only derived (ie, encoded)
characters can. I don't believe that "takes a character string as
input" has any intrinsic meaning.
Greg> Does that make it clearer what I'm getting at?
No.<wink> I already understood what you're getting at. As I said, I'm
sympathetic in principle. In practice, I think it's a loaded gun
aimed at my foot. And yours.
--
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
More information about the Python-Dev
mailing list