[Python-ideas] Use unbound bytes methods with objects supporting the buffer protocol
Terry Reedy
tjreedy at udel.edu
Wed Jul 13 17:09:14 EDT 2016
On 7/13/2016 12:40 PM, Serhiy Storchaka wrote:
> Unbound methods can be used as functions in python.
According to my naive understanding, in Python 3, there are not supposed
to be 'unbound methods'. Functions accessed as a class attribute are
supposed to *be* functions, and not just usable as a function. This is
true, at least for Python-coded classes.
class C():
def f(self, other):
return self + other
print(C.f)
# <function C.f at 0x000001BAFC97D598>
print(C.f(1,2))
# 3
print(C.f('a', 'b'))
# 'ab'
This works because the Python code is generic and being accessed as a
class attribute does not impose additional restrictions on inputs beyond
those inherent in the code itself.
>>> C.f('a', 2)
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
C.f('a', 2)
File "<pyshell#5>", line 3, in f
return self + other
TypeError: must be str, not int
>>> 'a'+2
Traceback (most recent call last):
File "<pyshell#30>", line 1, in <module>
'a'+2
TypeError: must be str, not int
If the situation is different for C-coded functions and classes, them it
would seem impossible to write a drop-in replacement for Python-coded
classes.
> bytes.lower(b) is
> the same as b.lower() if b is an instance of bytes. Many functions and
> methods that work with bytes accept not just bytes, but arbitrary
> objects that support the buffer protocol. Including bytes methods:
>
> >>> b'a:b'.split(memoryview(b':'))
> [b'a', b'b']
>
> But the first argument of unbound bytes method can be only a bytes
> instance.
>
> >>> bytes.split(memoryview(b'a:b'), b':')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: descriptor 'split' requires a 'bytes' object but received
> a 'memoryview'
My naive expectation was that bytes.split should be a built-in function,
like open -- but it is not
>>> open
<built-in function open>
>>> bytes.split
<method 'split' of 'bytes' objects>
and that the C coded split function would type check both args for being
bytes-like, as it does with the second.
>>> b'a:b'.split(':')
Traceback (most recent call last):
File "<pyshell#32>", line 1, in <module>
b'a:b'.split(':')
TypeError: a bytes-like object is required, not 'str'
Assuming that the descriptor check is not just an unintened holdover
from 2.x, it seems that for C-coded functions used as methods,
type-checking the first arg was conceptually factored out and replaced
by a generic check in the descriptor mechanism.
In this case, the descriptor check is stricter that you would like. Is
it stricter than necessary? If the memoryview were passed to the code
for bytes.check, would the code successfully run to conclusion? Is it
sufficiently generic at the machine bytes level?
> I think it would be helpful to allow using unbound bytes methods with
> arbitrary objects that support the buffer protocol as the first
> argument. This would allow to avoid unneeded copying (the primary
> purpose of the buffer protocol).
>
> >>> bytes.split(memoryview(b'a:b'), b':')
> [b'a', b'b']
If the descriptor check cannot be selectively loosened, a possible
solution might be a base class for all bytes-like buffer protocol
classes that would have all method functions that work with all
bytes-like objects.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list