[Python-ideas] Use unbound bytes methods with objects supporting the buffer protocol

Wed Jul 13 17:09:14 EDT 2016

On 7/13/2016 12:40 PM, Serhiy Storchaka wrote:
> Unbound methods can be used as functions in python.

According to my naive understanding, in Python 3, there are not supposed 
to be 'unbound methods'.  Functions accessed as a class attribute are 
supposed to *be* functions, and not just usable as a function.  This is 
true, at least for Python-coded classes.

class C():
	def f(self, other):
		return self + other

print(C.f)
# <function C.f at 0x000001BAFC97D598>
print(C.f(1,2))
# 3
print(C.f('a', 'b'))
# 'ab'

This works because the Python code is generic and being accessed as a 
class attribute does not impose additional restrictions on inputs beyond 
those inherent in the code itself.

 >>> C.f('a', 2)
Traceback (most recent call last):
   File "<pyshell#29>", line 1, in <module>
     C.f('a', 2)
   File "<pyshell#5>", line 3, in f
     return self + other
TypeError: must be str, not int

 >>> 'a'+2
Traceback (most recent call last):
   File "<pyshell#30>", line 1, in <module>
     'a'+2
TypeError: must be str, not int

If the situation is different for C-coded functions and classes, them it 
would seem impossible to write a drop-in replacement for Python-coded 
classes.

 > bytes.lower(b) is
> the same as b.lower() if b is an instance of bytes. Many functions and
> methods that work with bytes accept not just bytes, but arbitrary
> objects that support the buffer protocol. Including bytes methods:
>
>     >>> b'a:b'.split(memoryview(b':'))
>     [b'a', b'b']
>
> But the first argument of unbound bytes method can be only a bytes
> instance.
>
>     >>> bytes.split(memoryview(b'a:b'), b':')
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     TypeError: descriptor 'split' requires a 'bytes' object but received
> a 'memoryview'

My naive expectation was that bytes.split should be a built-in function, 
like open -- but it is not

 >>> open
<built-in function open>
 >>> bytes.split
<method 'split' of 'bytes' objects>

and that the C coded split function would type check both args for being 
bytes-like, as it does with the second.

 >>> b'a:b'.split(':')
Traceback (most recent call last):
   File "<pyshell#32>", line 1, in <module>
     b'a:b'.split(':')
TypeError: a bytes-like object is required, not 'str'

Assuming that the descriptor check is not just an unintened holdover 
from 2.x, it seems that for C-coded functions used as methods, 
type-checking the first arg was conceptually factored out and replaced 
by a generic check in the descriptor mechanism.

In this case, the descriptor check is stricter that you would like.  Is 
it stricter than necessary?  If the memoryview were passed to the code 
for bytes.check, would the code successfully run to conclusion?  Is it 
sufficiently generic at the machine bytes level?

> I think it would be helpful to allow using unbound bytes methods with
> arbitrary objects that support the buffer protocol as the first
> argument. This would allow to avoid unneeded copying (the primary
> purpose of the buffer protocol).
>
>     >>> bytes.split(memoryview(b'a:b'), b':')
>     [b'a', b'b']

If the descriptor check cannot be selectively loosened, a possible 
solution might be a base class for all bytes-like buffer protocol 
classes that would have all method functions that work with all 
bytes-like objects.

-- 
Terry Jan Reedy