[Python-ideas] duck typing for io write methods

Steven D'Aprano steve at pearwood.info
Fri Jun 14 02:08:44 CEST 2013


On 14/06/13 00:41, Wolfgang Maier wrote:

>>> It's funny you mention that difference since that was how I came across my
>>> issue. I was looking for a way to get back the Python 2.7 behaviour
>>> bytes('1234')
>>> '1234'
>>
>> You mean other than using the bytes literal b'1234' instead of a
>> string literal? Bytes and text are different things in Python 3,
>> whereas the 2.x "bytes" was just an alias for "str".
>>
>
> Well, I was illustrating the case with a literal integer, but, of course, I
> was thinking of cases with references:
> a=1234
> str(a).encode() # gives b'1234' in Python3, but converting your int to str
> first, just to encode it again to bytes seems weird

On the contrary, it is the most natural way to do it. Converting objects directly to bytes is not conceptually obvious. I can think of at least TWELVE obvious ways which the int 4 might convert to bytes (displaying in all hex, rather than the more compact but less consistent forms):

# Treat it as a 8-bit, 16-bit, 32-bit or 64-bit integer:
b'\x04'
b'\x00\x04'
b'\x04\x00'
b'\x00\x00\x00\x04'
b'\x04\x00\x00\x00'
b'\x00\x00\x00\x00\x00\x00\x00\x04'
b'\x04\x00\x00\x00\x00\x00\x00\x00'

# Convert it to the string '4' first, then encode to bytes
# as UTF-8, UTF-16, or UTF-32:
b'\x34'
b'\x00\x34'
b'\x34\x00'
b'\x34\x00\x00\x00'
b'\x00\x00\x00\x34'

The actual behaviour, where bytes(4) => b'\x00\x00\x00\x00', I consider to be neither obvious nor especially useful. If bytes were mutable, then bytes(4) would be a useful way to initialise a block of four bytes for later modification. But they aren't, so I don't really see the point. The obvious way to get four NUL bytes is surely b'\0'*4, so it's also redundant.

That you can't even subclass int and override it, like you can override every other dunder method (__str__, __repr__, __add__, __mul__, etc.) strikes me as astonishingly weird and in violation of the Zen:

Special cases aren't special enough to break the rules.

I imagine that the code for the bytes builtin looks something like this in pseudo-code:


if isinstance(arg, int):
     special case int
elif isinstance(arg, str):
     special case str
else:
     call __bytes__ method


I don't think it would effect performance very much, if at all, if it were changed to:

if type(arg) is int:
     special case int
elif type(arg) is str:
     special case str
else:
     call __bytes__ method


ints and strs will have to grow a dunder method in order to support inheritance, but the implication could be as simple as:

def __bytes__(self):
     return bytes(int(self))

def __bytes__(self, encoding):
     return bytes(str(self), encoding)


Of course, I may have missed some logic for the current behaviour.


-- 
Steven


More information about the Python-ideas mailing list