[Python-Dev] PEP 574 (pickle 5) implementation and backport available

Matthew Rocklin mrocklin at gmail.com
Sat May 26 12:07:35 EDT 2018


Hi all,

I agree that compression is often a good idea when moving serialized
objects around on a network, but for what it's worth I as a library author
would always set compress=False and then handle it myself as a separate
step.  There are a few reasons for this:

   1. Bandwidth is often pretty good, especially intra-node, on high
   performance networks, or on decent modern discs (NVMe)
   2. I often use different compression technologies in different
   situations.  LZ4 is a great all-around default, but often snappy, blosc, or
   z-standrad are better suited.  This depends strongly on the characteristics
   of the data.
   3. Very often data often isn't compressible, or is already in some
   compressed form, such as in images, and so compressing only hurts you.

In general, my thought is that compression is a complex topic with enough
intricaces that setting a single sane default that works 70+% of the time
probably isn't possible (at least not with the applications that I get
exposed to).

Instead of baking a particular method into pickle.dumps I would recommend
trying to solve this problem through documentation, pointing users to the
various compression libraries within the broader Python ecosystem, and
perhaps pointing to one of the many blogposts that discuss their strengths
and weaknesses.

Best,
-matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180526/7d956344/attachment.html>


More information about the Python-Dev mailing list