[Python-Dev] Pre-PEP: The "bytes" object
Ron Adam
rrr at ronadam.com
Sat Feb 25 07:23:59 CET 2006
Neil Schemenauer wrote:
> Ron Adam <rrr at ronadam.com> wrote:
>> Why was it decided that the unicode encoding argument should be ignored
>> if the first argument is a string? Wouldn't an exception be better
>> rather than give the impression it does something when it doesn't?
>
>>From the PEP:
>
> There is no sane meaning that the encoding can have in that
> case. str objects *are* byte arrays and they know nothing about
> the encoding of character data they contain. We need to assume
> that the programmer has provided str object that already uses
> the desired encoding.
>
> Raising an exception would be a valid option. However, passing the
> string through unchanged makes the transition from str to bytes
> easier.
>
> Neil
I guess I'm concerned that if the string isn't already in the specified
encoding it could pass though without complaining and not be encoded as
expected.
>>> b.bytes(u'abc', 'hex-codec')
bytes([54, 49, 54, 50, 54, 51])
>>> b.bytes('abc', 'hex-codec')
bytes([97, 98, 99]) # not hex
If this was in a function I would need to do a check of some sort
anyways or cast to unicode beforehand, or encode beforehand. Which
negates the advantage of having the codec argument in bytes unfortunately.
def hexabyte(s):
s = unicode(s)
return bytes(s, 'hex-codec')
or
def hexabyte(s):
s = s.encode('hex-codec')
return bytes(s)
It seems to me if you are specifying a codec for bytes, then you will
not be expecting to get an already encoded string, and if you do, it may
not be in the codec you want since you are probably not specifying the
default codec.
Ron
More information about the Python-Dev
mailing list