[Python-Dev] Allowing u.encode() to return non-strings

Tim Peters tim.peters at gmail.com
Sat Jun 26 01:02:37 EDT 2004


[Bill Janssen]
> While we're talking about this, Martin, what is the encoding of the
> "string" returned by
>
>      struct.pack("bbb", 0xFF, 0x00, 0x83)

That raises an exception.  Did you intend "BBB" as the format string?

> And what should it be?

Whatever you explicitly say it is, when you feed the string to a
context that requires knowing the intended encoding.  It has no
intrinsic encoding -- encoding is in your head (or your app's
requirements), not in the string itself.  It's like asking which
numeric base the sequence of digits

   1111

is written in.  You can't tell by staring it at.  Maybe that was
decimal, maybe it was binary.  But no -- I actually had base 2004 in
mind <wink>.

Here are two distinct possibilities for your specific string:

>>> s = struct.pack("BBB", 0xFF, 0x00, 0x83)
>>> unicode(s, 'latin1')
u'\xff\x00\x83'
>>> unicode(s, 'cp1252')
u'\xff\x00\u0192'
>>>

Python can't guess the intent.  Neither could a highly knowledgeable
but non-telepathic human, for that matter.



More information about the Python-Dev mailing list