data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
bytearray is used as read / write buffer in asyncio. When implementing reading stream, read() API may be defined as returns "bytes". To get front n bytes from bytearray, there are two ways. 1. bytes(bytearray[:n]) 2. bytes(memoryview(bytearray)[:n]) (1) is simplest, but it produces temporary bytearray having n bytes. While (2) is more efficient than (1), it uses still temporary memoryview object, and it looks bit tricky. I want simple, readable, and temporary-less way to get part of bytearray as bytes. The API I propose looks like this: bytes.frombuffer(byteslike, length=-1, offset=0) How do you feel about it? -- INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/0a8b7/0a8b7b503c69a6e5454541863a21a5541eb573c3" alt=""
On Sat, Aug 6, 2016 at 8:45 PM INADA Naoki <songofacandy@gmail.com> wrote:
Does that actually make the difference between unacceptably inefficient performance and acceptably efficient for an application you're working on?
While (2) is more efficient than (1), it uses still temporary memoryview object, and it looks bit tricky.
Using the memoryview is nicely explicit whereas ``bytes.frombuffer`` could be creating a temporary bytearray as part of its construction. The API I propose looks like this:
bytes.frombuffer(byteslike, length=-1, offset=0)
RawIOBase.read and the other read methods described in the io module use the parameter "size" instead of "length". https://docs.python.org/3/library/io.html#io.RawIOBase
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 August 2016 at 15:08, Michael Selik <michael.selik@gmail.com> wrote:
It could, but it wouldn't (since that would be pointlessly inefficient). The main question to be answered here would be whether adding a dedicated spelling for "bytes(memoryview(bytearray)[:n])" actually smooths out the learning curve for memoryview in general, where folks would learn: 1. "bytes(mybytearray[:n])" copies the data twice for no good reason 2. "bytes.frombuffer(mybytearray, n)" avoids the double copy 3. "bytes(memoryview(mybytearray)[:n])" generalises to arbitrary slices With memoryview being a builtin, I'm not sure that argument can be made successfully - the transformation in going from step 1 direct to step 3 is just "wrap the original object with memoryview before slicing to avoid the double copy", and that's no more complicated than using a different constructor method. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Aug 8, 2016 at 12:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yes. My intention is Tornado and AsyncIO. Since they are framework, we can't assume how large it is -- it may be few bytes ~ few giga bytes.
I'm not sure, too. memoryview may and may not be bytes-like object which os.write or socket.send accepts. But memoryview is successor of buffer. So we should encourage to use it for zero copy slicing. Thank you. -- INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
Today, @socketpair found my bug relating to this. https://github.com/python/asyncio/pull/395#r79136785 For example, Bad (but works on CPython): packet_bytes = bytes(memoryview(buff)[:packet_len]) del buff[:packet_len] Good: with memoryview(buff) as m: packet_bytes = bytes(m[:packet_len]) del buff[:packet_len] There are two problem. 1. Avoiding bad code is difficult. It works without any warning on CPython 2. Good code has significant overhead. Slicing bytes from bytearray buffer is usually in low level code and performance may be important. So I feel dedicated API for slicing bytes from bytes-like is worth enough. Any thought about it?
data:image/s3,"s3://crabby-images/0a8b7/0a8b7b503c69a6e5454541863a21a5541eb573c3" alt=""
On Sat, Aug 6, 2016 at 8:45 PM INADA Naoki <songofacandy@gmail.com> wrote:
Does that actually make the difference between unacceptably inefficient performance and acceptably efficient for an application you're working on?
While (2) is more efficient than (1), it uses still temporary memoryview object, and it looks bit tricky.
Using the memoryview is nicely explicit whereas ``bytes.frombuffer`` could be creating a temporary bytearray as part of its construction. The API I propose looks like this:
bytes.frombuffer(byteslike, length=-1, offset=0)
RawIOBase.read and the other read methods described in the io module use the parameter "size" instead of "length". https://docs.python.org/3/library/io.html#io.RawIOBase
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 August 2016 at 15:08, Michael Selik <michael.selik@gmail.com> wrote:
It could, but it wouldn't (since that would be pointlessly inefficient). The main question to be answered here would be whether adding a dedicated spelling for "bytes(memoryview(bytearray)[:n])" actually smooths out the learning curve for memoryview in general, where folks would learn: 1. "bytes(mybytearray[:n])" copies the data twice for no good reason 2. "bytes.frombuffer(mybytearray, n)" avoids the double copy 3. "bytes(memoryview(mybytearray)[:n])" generalises to arbitrary slices With memoryview being a builtin, I'm not sure that argument can be made successfully - the transformation in going from step 1 direct to step 3 is just "wrap the original object with memoryview before slicing to avoid the double copy", and that's no more complicated than using a different constructor method. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Aug 8, 2016 at 12:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yes. My intention is Tornado and AsyncIO. Since they are framework, we can't assume how large it is -- it may be few bytes ~ few giga bytes.
I'm not sure, too. memoryview may and may not be bytes-like object which os.write or socket.send accepts. But memoryview is successor of buffer. So we should encourage to use it for zero copy slicing. Thank you. -- INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
Today, @socketpair found my bug relating to this. https://github.com/python/asyncio/pull/395#r79136785 For example, Bad (but works on CPython): packet_bytes = bytes(memoryview(buff)[:packet_len]) del buff[:packet_len] Good: with memoryview(buff) as m: packet_bytes = bytes(m[:packet_len]) del buff[:packet_len] There are two problem. 1. Avoiding bad code is difficult. It works without any warning on CPython 2. Good code has significant overhead. Slicing bytes from bytearray buffer is usually in low level code and performance may be important. So I feel dedicated API for slicing bytes from bytes-like is worth enough. Any thought about it?
participants (3)
-
INADA Naoki
-
Michael Selik
-
Nick Coghlan