[Python-ideas] Add function readbyte to asyncio.StreamReader

Jörn Heissler python-ideas-2018 at tutnicht.de
Fri Jul 27 03:48:51 EDT 2018


On Mon, Jul 23, 2018 at 22:25:14 +0100, Gustavo Carneiro wrote:
> Well, even if it is worth, i.e. your use case is not rare enough,

Reading a single byte is certainly a special case, but I think it's generic
enough to warrant its own function.
I believe there should be many binary protocols out there that would benefit
from such a function.

For example the first socks5 (rfc1928) message could be parsed like this:

async def read_methods(reader):
    if (version := await reader.readbyte()) != 5:
        raise Exception(f'Bad version: {version}')
    if (nmethods := await reader.readbyte()) == 0:
        raise Exception('Need at least one method')
    return await reader.readexactly(nmethods)

> I would
> suggest at least making it private, readexactly can call this specialised
> function if nbytes==1:
> 
> def _readbyte(self):
>    ....
> 
> def readexactly(self, num):
>    if num == 1:
>       return self._readbyte()
>   ... the rest stays the same..

Maybe I wasn't clear in my intent: readbyte would not return a bytes object but
an integer, i.e. the byte value.

My current approach is this:

value = (await reader.readexactly(1))[0]

I'd like to make it more readable (and faster at the same time):

value = await reader.readbyte()

> But to be honest, you are probably better off managing the buffer yourself:
> Just call, e.g., stream.read(4096), it will return a buffer of up to 4k
> length, then you can iterate over the buffer byte by byte until the
> condition is met, repeat until the end of stream, or whatever.

StreamReader already manages a buffer. Managing a second buffer would
mean I'd need to copy all my data from one buffer to another.

But let's assume I went this way and iterated over my own buffer:

* I receive some bytes. Maybe it's exactly the amount I need, then I can parse
  it and discard the buffer.
* Or it's less than I need. I'd have to wait for more data and either restart my
  parser or remember the state from before.
* Or it's more than I need. I'd have to remove the parsed bytes from the buffer.
  Alternatively I could push back the unparsed bytes to my buffer.

This adds lots of code complexity. And the code is likely slower than calling
readbyte() couple of times; for my current use case, calling it once is usually
sufficient.

I like my current approach way better than managing my own buffer and thus
reinventing StreamReader.

Adding the new function as proposed would improve the code in both readability
and speed.


More information about the Python-ideas mailing list