[issue36050] Why does http.client.HTTPResponse._safe_read use MAXAMOUNT

Wed Feb 20 07:55:43 EST 2019

New submission from Bruce Merry <bmerry at gmail.com>:

While investigating poor HTTP read performance I discovered that reading all the data from a response with a content-length goes via _safe_read, which in turn reads in chunks of at most MAXAMOUNT (1MB) before stitching them together with b"".join. This can really hurt performance for responses larger than MAXAMOUNT, because
(a) the data has to be copied an additional time; and
(b) the join operation doesn't drop the GIL, so this limits multi-threaded scaling.

I'm struggling to see any advantage in doing this chunking - it's not saving memory either (in fact it is wasting it).

To give an idea of the performance impact, changing MAXAMOUNT to a very large value made a multithreaded test of mine go from 800MB/s to 2.5GB/s (which is limited by the network speed).

----------
components: Library (Lib)
messages: 336081
nosy: bmerry
priority: normal
severity: normal
status: open
title: Why does http.client.HTTPResponse._safe_read use MAXAMOUNT
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36050>
_______________________________________