[Numpy-discussion] PEP 574 - zero-copy pickling with out of band data

Charles R Harris charlesr.harris at gmail.com
Mon Jul 2 19:16:00 EDT 2018


On Mon, Jul 2, 2018 at 3:03 PM, Antoine Pitrou <antoine at python.org> wrote:

>
> Hello,
>
> Some of you might know that I've been working on a PEP in order to
> improve pickling performance of large (or huge) data.  The PEP,
> numbered 574 and titled "Pickle protocol 5 with out-of-band data",
> allows participating data types to be pickled without any memory copy.
> https://www.python.org/dev/peps/pep-0574/
>
> The PEP already has an implementation, which is backported as an
> independent PyPI package under the name "pickle5".
> https://pypi.org/project/pickle5/
>
> I also have a working patch updating PyArrow to use the PEP-defined
> extensions to allow for zero-copy pickling of Arrow arrays - without
> breaking compatibility with existing usage:
> https://github.com/apache/arrow/pull/2161
>
> Still, it is obvious one the primary targets of PEP 574 is Numpy
> arrays, as the most prevalent datatype in the Python scientific
> ecosystem.  I'm personally satisfied with the current state of the PEP,
> but I'd like to have feedback from Numpy core maintainers.  I haven't
> tried (yet?) to draft a Numpy patch to add PEP 574 support, since that's
> likely to be more involved due to the complexity of Numpy and due to
> the core being written in C.  Therefore I would like some help
> evaluating whether the PEP is likely to be a good fit for Numpy.
>
>
Maybe somewhat off topic, but we have had trouble with a 2 GiB limit on
file writes on OS X. See https://github.com/numpy/numpy/issues/3858. Does
your implementation work around that?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180702/a607f8a1/attachment.html>


More information about the NumPy-Discussion mailing list