<div dir="ltr"><div><div>Hi,<br><br></div>cStringIO was removed from Python 3. It seems the suggested replacement is io.BytesIO. But there is a problem: cStringIO.StringIO(b'data') didn't copy the data while io.BytesIO(b'data') makes a copy (even if the data is not modified later).<br>
<br>This means io.BytesIO is not suited well to cases when you want to get a readonly file-like interface for existing byte strings. Isn't it one of the main io.BytesIO use cases? Wrapping bytes in cStringIO.StringIO used to be almost free, but this is not true for io.BytesIO. <br>
<br>So making code 3.x compatible by ditching cStringIO can cause a serious performance/memory regressions. One can change the code to build the data using BytesIO (without creating bytes objects in the first place), but that is not always possible or convenient.<br>
<br>I believe this problem affects tornado
(<a href="https://github.com/tornadoweb/tornado/issues/1110">https://github.com/tornadoweb/tornado/issues/1110</a>), Scrapy (this is how
I became aware of this issue), NLTK (anecdotical evidence - I tried to port some hairy NLTK module
to io.BytesIO, it became many times slower) and maybe pretty much every
IO-related project ported to Python 3.x (django - <a href="https://github.com/django/django/blob/fff7b507ef2f85bb47abd2ee32982682d7822ac4/django/http/request.py#L225">check</a>, werkzeug and frameworks based on it - <a href="https://github.com/mitsuhiko/werkzeug/blob/976b63cadf3d5482aa975df053fa458ff638e571/werkzeug/wrappers.py#L375">check</a>, requests - <a href="https://github.com/kennethreitz/requests/blob/6b21e5c8f0c8fafda661d80f4555ce530507bd68/requests/models.py">check</a> - they all wrap user data to BytesIO, and this may cause slowdowns and up to 2x memory usage in Python 3.x).<br>
<br></div>Do you know if there a workaround? Maybe there is some stdlib part that I'm missing, or a module on PyPI? It is not that hard to write an own wrapper that won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an existing solution or plans to fix it in Python itself - this BytesIO use case looks quite important.<br>
</div>