Use unbound bytes methods with objects supporting the buffer protocol
Unbound methods can be used as functions in python. bytes.lower(b) is the same as b.lower() if b is an instance of bytes. Many functions and methods that work with bytes accept not just bytes, but arbitrary objects that support the buffer protocol. Including bytes methods: >>> b'a:b'.split(memoryview(b':')) [b'a', b'b'] But the first argument of unbound bytes method can be only a bytes instance. >>> bytes.split(memoryview(b'a:b'), b':') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: descriptor 'split' requires a 'bytes' object but received a 'memoryview' I think it would be helpful to allow using unbound bytes methods with arbitrary objects that support the buffer protocol as the first argument. This would allow to avoid unneeded copying (the primary purpose of the buffer protocol). >>> bytes.split(memoryview(b'a:b'), b':') [b'a', b'b']
On 7/13/2016 12:40 PM, Serhiy Storchaka wrote:
Unbound methods can be used as functions in python.
According to my naive understanding, in Python 3, there are not supposed
to be 'unbound methods'. Functions accessed as a class attribute are
supposed to *be* functions, and not just usable as a function. This is
true, at least for Python-coded classes.
class C():
def f(self, other):
return self + other
print(C.f)
#
C.f('a', 2) Traceback (most recent call last): File "
", line 1, in <module> C.f('a', 2) File " ", line 3, in f return self + other TypeError: must be str, not int
'a'+2 Traceback (most recent call last): File "
", line 1, in <module> 'a'+2 TypeError: must be str, not int
If the situation is different for C-coded functions and classes, them it would seem impossible to write a drop-in replacement for Python-coded classes.
bytes.lower(b) is the same as b.lower() if b is an instance of bytes. Many functions and methods that work with bytes accept not just bytes, but arbitrary objects that support the buffer protocol. Including bytes methods:
>>> b'a:b'.split(memoryview(b':')) [b'a', b'b']
But the first argument of unbound bytes method can be only a bytes instance.
>>> bytes.split(memoryview(b'a:b'), b':') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: descriptor 'split' requires a 'bytes' object but received a 'memoryview'
My naive expectation was that bytes.split should be a built-in function, like open -- but it is not
open <built-in function open> bytes.split
and that the C coded split function would type check both args for being bytes-like, as it does with the second.
b'a:b'.split(':') Traceback (most recent call last): File "
", line 1, in <module> b'a:b'.split(':') TypeError: a bytes-like object is required, not 'str'
Assuming that the descriptor check is not just an unintened holdover from 2.x, it seems that for C-coded functions used as methods, type-checking the first arg was conceptually factored out and replaced by a generic check in the descriptor mechanism. In this case, the descriptor check is stricter that you would like. Is it stricter than necessary? If the memoryview were passed to the code for bytes.check, would the code successfully run to conclusion? Is it sufficiently generic at the machine bytes level?
I think it would be helpful to allow using unbound bytes methods with arbitrary objects that support the buffer protocol as the first argument. This would allow to avoid unneeded copying (the primary purpose of the buffer protocol).
>>> bytes.split(memoryview(b'a:b'), b':') [b'a', b'b']
If the descriptor check cannot be selectively loosened, a possible solution might be a base class for all bytes-like buffer protocol classes that would have all method functions that work with all bytes-like objects. -- Terry Jan Reedy
On 14 July 2016 at 07:09, Terry Reedy
Assuming that the descriptor check is not just an unintened holdover from 2.x, it seems that for C-coded functions used as methods, type-checking the first arg was conceptually factored out and replaced by a generic check in the descriptor mechanism.
It's intentional - the default C level descriptors typecheck their first argument, since getting that wrong may cause a segfault in most cases.
If the descriptor check cannot be selectively loosened, a possible solution might be a base class for all bytes-like buffer protocol classes that would have all method functions that work with all bytes-like objects.
A custom wrapper descriptor that checks for "supports the buffer protocol" rather than "is a bytes-like object" is certainly possible, so I believe Serhiy's question here is more a design question around "Should they?" than it is a technical question around "Can they?". Given the way this would behave if "bytes" was implemented in Python rather than C (i.e. unbound methods would rely on ducktyping, even for the first argument), +1 from me for making the unbound methods for bytes compatible with arbitrary objects supporting the buffer protocol. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Jul 14, 2016 at 8:52 AM, Nick Coghlan
Given the way this would behave if "bytes" was implemented in Python rather than C (i.e. unbound methods would rely on ducktyping, even for the first argument), +1 from me for making the unbound methods for bytes compatible with arbitrary objects supporting the buffer protocol.
The buffer protocol is a bit generic for duck typing. Instead the bytes methods could check for a memoryview with a format that's "B" or "b". >>> a = np.array([1,2,3,4], dtype='int16') >>> b = np.array([1,2,3,4], dtype='uint8') >>> memoryview(a).format 'h' >>> memoryview(b).format 'B' It's possible to cast if necessary, e.g. memoryview(a).cast('B'). No copy of the data is made, so it's still reasonably efficient. This preserves raising a TypeError for operations that are generally nonsensical, such as attempting to split() an array of short integers as if it's just bytes.
On 14.07.16 19:31, eryk sun wrote:
On Thu, Jul 14, 2016 at 8:52 AM, Nick Coghlan
wrote: Given the way this would behave if "bytes" was implemented in Python rather than C (i.e. unbound methods would rely on ducktyping, even for the first argument), +1 from me for making the unbound methods for bytes compatible with arbitrary objects supporting the buffer protocol.
The buffer protocol is a bit generic for duck typing. Instead the bytes methods could check for a memoryview with a format that's "B" or "b".
>>> a = np.array([1,2,3,4], dtype='int16') >>> b = np.array([1,2,3,4], dtype='uint8') >>> memoryview(a).format 'h' >>> memoryview(b).format 'B'
It's possible to cast if necessary, e.g. memoryview(a).cast('B'). No copy of the data is made, so it's still reasonably efficient. This preserves raising a TypeError for operations that are generally nonsensical, such as attempting to split() an array of short integers as if it's just bytes.
This looks reasonable. But for now bytes methods accept arbitrary buffers. >>> a = np.array([1,2,3,4], dtype='int16') >>> b = np.array([1,2,3,4], dtype='uint8') >>> b'.'.join([a, b]) b'\x01\x00\x02\x00\x03\x00\x04\x00.\x01\x02\x03\x04'
participants (4)
-
eryk sun
-
Nick Coghlan
-
Serhiy Storchaka
-
Terry Reedy