Hi Antoine,
Thanks for posting this, and sorry for the delayed reply!
I've known about a possibility to optimize asyncio Protocols for a while. I noticed that `Protocol.data_received()` requires making one extra copy of the received data when I was working on the initial version of uvloop. Back then my main priority was to make uvloop fully compatible with asyncio, so I wasn't really thinking about improving asyncio design.
Let me explain the current flaw of `Protocol.data_received()` so that other people on the list can catch up with the discussion:
1. Currently, when a Transport is reading data, it uses `sock.recv()` call, which returns a `bytes` object, which is then pushed to `Protocol.data_received()`. Every time `sock.recv()` is called, a new bytes object is allocated.
2. Typically, protocols need to accumulate bytes objects they receive until they have enough buffered data to be parsed. Usually a `deque` is used for that, less optimized code just concatenates all bytes objects into one.
3. When enough data is gathered and a protocol message can be parsed out of it, usually there's a need to concatenate a few buffers from the `deque` or get a slice of the concatenated buffer. At this point, we've copied the received data two times.
I propose to add another Protocol base class to asyncio: BufferedProtocol. It won't have the 'data_received()' method, instead it will have 'get_buffer()' and 'buffer_updated(nbytes)' methods:
class asyncio.BufferedProtocol:
def get_buffer(self) -> memoryview:
pass
def buffer_updated(self, nbytes: int):
pass
When the protocol's transport is ready to receive data, it will call `protocol.get_buffer()`. The latter must return an object that implements the buffer protocol. The transport will request a writable buffer over the returned object and receive data *into* that buffer.
When the `sock.recv_into(buffer)` call is done, `protocol.buffer_updated(nbytes)` method will be called. The number of bytes received into the buffer will be passed as a first argument.
I've implemented the proposed design in uvloop (branch 'get_buffer', [1]) and adjusted your benchmark [2] to use it. Here are benchmark results from my machine (macOS):
vanilla asyncio: 120-135 Mb/s
uvloop: 320-330 Mb/s
uvloop/get_buffer: 600-650 Mb/s.
The benchmark is quite unstable, but it's clear that Protocol.get_buffer() allows to implement framing way more efficiently.
I'm also working on porting asyncpg library to use get_buffer(), as it has a fairly good benchmark suite. So far I'm seeing 5-15% speed boost on all benchmarks. What's more important is that get_buffer() makes asyncpg buffer implementation simpler!
I'm quite happy with these results and Ipropose to implement the get_buffer() API (or its equivalent) in Python 3.7. I've opened an issue [3] to discuss the implementation details.