[Python-Dev] Pre-PEP: The "bytes" object

Ron Adam rrr at ronadam.com
Sat Feb 25 07:23:59 CET 2006


Neil Schemenauer wrote:
> Ron Adam <rrr at ronadam.com> wrote:
>> Why was it decided that the unicode encoding argument should be ignored 
>> if the first argument is a string?  Wouldn't an exception be better 
>> rather than give the impression it does something when it doesn't?
> 
>>From the PEP:
> 
>     There is no sane meaning that the encoding can have in that
>     case.  str objects *are* byte arrays and they know nothing about
>     the encoding of character data they contain.  We need to assume
>     that the programmer has provided str object that already uses
>     the desired encoding.
> 
> Raising an exception would be a valid option.  However, passing the
> string through unchanged makes the transition from str to bytes
> easier.
> 
>   Neil

I guess I'm concerned that if the string isn't already in the specified 
encoding it could pass though without complaining and not be encoded as 
expected.

 >>> b.bytes(u'abc', 'hex-codec')
bytes([54, 49, 54, 50, 54, 51])

 >>> b.bytes('abc', 'hex-codec')
bytes([97, 98, 99])                # not hex

If this was in a function I would need to do a check of some sort 
anyways or cast to unicode beforehand, or encode beforehand.  Which 
negates the advantage of having the codec argument in bytes unfortunately.

    def hexabyte(s):
        s = unicode(s)
        return bytes(s, 'hex-codec')
or

    def hexabyte(s):
        s = s.encode('hex-codec')
        return bytes(s)

It seems to me if you are specifying a codec for bytes, then you will 
not be expecting to get an already encoded string, and if you do, it may 
not be in the codec you want since you are probably not specifying the 
default codec.

Ron




More information about the Python-Dev mailing list