[Distutils] greater compression on pypi?

Wes Turner wes.turner at gmail.com
Mon Aug 22 13:55:24 EDT 2016

On Monday, August 22, 2016, Daniel Holth <dholth at gmail.com> wrote:

> Some obvious ideas about how to enable greater compression for pypi,
> should anyone be motivated enough to do so.
> 1. If it's a zip, nested zips like so,
> setup.py
> (metadata)
> data.zip
> The metadata is easy to get to, and everything else requires a second
> unpack operation. data.zip is stored, and only compressed by the outer
> .zip. This could be done in a backwards compatible way.
> Wheel could be revised to put everything except *.dist-info inside a
> zipped *.data directory.


PEX is just a class that manages requirements (often embedded within PEX
files as egg distributions in the .deps directory) and autoimports them
into the sys.path, then executes a prescribed entry point.

If you read the code closely, you'll notice that it relies upon
monkeypatching zipimport. Inside the twitter.common.python library we've
provided a recursive zip importer derived from Google's pure Python
that allows for depending upon eggs within eggs or zips (and so forth) so
that PEX files need not extract egg dependencies to disk a priori. This
even extends to C extensions (.so and .dylib files) which are written to
disk long enough to be dlopened before being unlinked.
 """ - https://pantsbuild.github.io/pex_design.html#pex-__main__py

find_eggs_in_zip and find_wheels_in_zip in
may be helpful for building a recursive zip importer
(Which may or may not reduce total pypi bandwidth because DEFLATE)

> 2. Sign the uncompressed data
> Check hashes and signatures against the .tar file instead of .tar.gz when
> doing pip install ... #sha256=nnn. For zip, check against a hash of all the
> hashes of the uncompressed members.
> 3. Go crazy
> pypi is now free to re-compress without additional input from the
> publisher. Both .gz and .lzma versions etc. could be offered.
