Add function readbyte to asyncio.StreamReader

Hello, I'm implementing a protocol where I need to read individual bytes until a condition is met (value & 0x80 == 0). My current approach is: value = (await reader.readexactly(1))[0] To speed this up, I propose that a new function is added to asyncio.StreamReader: value = await reader.readbyte() I duplicated readexactly and stripped out some parts. Below code appears to work: async def readbyte(self): if self._exception is not None: raise self._exception while not self._buffer: if self._eof: raise EOFError() await self._wait_for_data('readbyte') data = self._buffer[0] del self._buffer[0] self._maybe_resume_transport() return data For comparing the speed, I'm receiving a 50 MiB file byte-by-byte. cpython-3.7.0: readexactly: 42.43 seconds readbyte : 22.05 seconds speedup : 92.4% pypy3-v6.0.0: readexactly: 3.21 seconds readbyte : 2.76 seconds speedup : 16.3% Thanks

Well, even if it is worth, i.e. your use case is not rare enough, I would suggest at least making it private, readexactly can call this specialised function if nbytes==1: def _readbyte(self): .... def readexactly(self, num): if num == 1: return self._readbyte() ... the rest stays the same.. But to be honest, you are probably better off managing the buffer yourself: Just call, e.g., stream.read(4096), it will return a buffer of up to 4k length, then you can iterate over the buffer byte by byte until the condition is met, repeat until the end of stream, or whatever. On Sun, 22 Jul 2018 at 12:11, Jörn Heissler <python-ideas-2018@tutnicht.de> wrote:
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

On Mon, Jul 23, 2018 at 22:25:14 +0100, Gustavo Carneiro wrote:
Well, even if it is worth, i.e. your use case is not rare enough,
Reading a single byte is certainly a special case, but I think it's generic enough to warrant its own function. I believe there should be many binary protocols out there that would benefit from such a function. For example the first socks5 (rfc1928) message could be parsed like this: async def read_methods(reader): if (version := await reader.readbyte()) != 5: raise Exception(f'Bad version: {version}') if (nmethods := await reader.readbyte()) == 0: raise Exception('Need at least one method') return await reader.readexactly(nmethods)
Maybe I wasn't clear in my intent: readbyte would not return a bytes object but an integer, i.e. the byte value. My current approach is this: value = (await reader.readexactly(1))[0] I'd like to make it more readable (and faster at the same time): value = await reader.readbyte()
StreamReader already manages a buffer. Managing a second buffer would mean I'd need to copy all my data from one buffer to another. But let's assume I went this way and iterated over my own buffer: * I receive some bytes. Maybe it's exactly the amount I need, then I can parse it and discard the buffer. * Or it's less than I need. I'd have to wait for more data and either restart my parser or remember the state from before. * Or it's more than I need. I'd have to remove the parsed bytes from the buffer. Alternatively I could push back the unparsed bytes to my buffer. This adds lots of code complexity. And the code is likely slower than calling readbyte() couple of times; for my current use case, calling it once is usually sufficient. I like my current approach way better than managing my own buffer and thus reinventing StreamReader. Adding the new function as proposed would improve the code in both readability and speed.

Well, even if it is worth, i.e. your use case is not rare enough, I would suggest at least making it private, readexactly can call this specialised function if nbytes==1: def _readbyte(self): .... def readexactly(self, num): if num == 1: return self._readbyte() ... the rest stays the same.. But to be honest, you are probably better off managing the buffer yourself: Just call, e.g., stream.read(4096), it will return a buffer of up to 4k length, then you can iterate over the buffer byte by byte until the condition is met, repeat until the end of stream, or whatever. On Sun, 22 Jul 2018 at 12:11, Jörn Heissler <python-ideas-2018@tutnicht.de> wrote:
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

On Mon, Jul 23, 2018 at 22:25:14 +0100, Gustavo Carneiro wrote:
Well, even if it is worth, i.e. your use case is not rare enough,
Reading a single byte is certainly a special case, but I think it's generic enough to warrant its own function. I believe there should be many binary protocols out there that would benefit from such a function. For example the first socks5 (rfc1928) message could be parsed like this: async def read_methods(reader): if (version := await reader.readbyte()) != 5: raise Exception(f'Bad version: {version}') if (nmethods := await reader.readbyte()) == 0: raise Exception('Need at least one method') return await reader.readexactly(nmethods)
Maybe I wasn't clear in my intent: readbyte would not return a bytes object but an integer, i.e. the byte value. My current approach is this: value = (await reader.readexactly(1))[0] I'd like to make it more readable (and faster at the same time): value = await reader.readbyte()
StreamReader already manages a buffer. Managing a second buffer would mean I'd need to copy all my data from one buffer to another. But let's assume I went this way and iterated over my own buffer: * I receive some bytes. Maybe it's exactly the amount I need, then I can parse it and discard the buffer. * Or it's less than I need. I'd have to wait for more data and either restart my parser or remember the state from before. * Or it's more than I need. I'd have to remove the parsed bytes from the buffer. Alternatively I could push back the unparsed bytes to my buffer. This adds lots of code complexity. And the code is likely slower than calling readbyte() couple of times; for my current use case, calling it once is usually sufficient. I like my current approach way better than managing my own buffer and thus reinventing StreamReader. Adding the new function as proposed would improve the code in both readability and speed.
participants (2)
-
Gustavo Carneiro
-
Jörn Heissler