[New-bugs-announce] [issue22003] BytesIO copy-on-write

David Wilson report at bugs.python.org
Fri Jul 18 00:25:37 CEST 2014

New submission from David Wilson:

This is a followup to the thread at https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing the existing behaviour of BytesIO copying its source object, and how this regresses compared to cStringIO.StringI.

The goal of posting the patch on list was to try and stimulate discussion around the approach. The patch itself obviously isn't ready for review, and I'm not in a position to dedicate time to it just now (although in a few weeks I'd love to give it full attention!).

Ignoring this quick implementation, are there any general comments around the approach?

My only concern is that it might keep large objects alive in a non-intuitive way in certain circumstances, though I can't think of any obvious ones immediately.

Also interested in comments on the second half of that thread: "a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer."

There are quite a few interactions with making that work correctly, in particular:

* How BytesIO would implement the buffers interface without causing the under-construction Bytes to become readonly

* Avoiding redundant copies and resizes -- we can't simply tack 25% slack on the end of the Bytes and then truncate it during getvalue() without likely triggering a copy and move, however with careful measurement of allocator behavior there are various tradeoffs that could be made - e.g. obmalloc won't move a <500 byte allocation if it shrinks by <25%. glibc malloc's rules are a bit more complex though.

Could also add a private _PyBytes_SetSize() API to allow truncation to the final size during getvalue() without informing the allocator. Then we'd simply overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.

Notably, this approach completely differs from the one documented in http://bugs.python.org/issue15381 .. it's not clear to me which is better.

components: Library (Lib)
files: cow.patch
keywords: patch
messages: 223383
nosy: dw
priority: normal
severity: normal
status: open
title: BytesIO copy-on-write
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file35988/cow.patch

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list