[Distutils] greater compression on pypi?

Wes Turner wes.turner at gmail.com
Mon Aug 22 13:55:24 EDT 2016


On Monday, August 22, 2016, Daniel Holth <dholth at gmail.com> wrote:

> Some obvious ideas about how to enable greater compression for pypi,
> should anyone be motivated enough to do so.
>
> 1. If it's a zip, nested zips like so,
>
> setup.py
> README
> (metadata)
> data.zip
>
> The metadata is easy to get to, and everything else requires a second
> unpack operation. data.zip is stored, and only compressed by the outer
> .zip. This could be done in a backwards compatible way.
>
> Wheel could be revised to put everything except *.dist-info inside a
> zipped *.data directory.
>

"""

PEX is just a class that manages requirements (often embedded within PEX
files as egg distributions in the .deps directory) and autoimports them
into the sys.path, then executes a prescribed entry point.

If you read the code closely, you'll notice that it relies upon
monkeypatching zipimport. Inside the twitter.common.python library we've
provided a recursive zip importer derived from Google's pure Python
zipimport
<http://code.google.com/appengine/articles/django10_zipimport.html>module
that allows for depending upon eggs within eggs or zips (and so forth) so
that PEX files need not extract egg dependencies to disk a priori. This
even extends to C extensions (.so and .dylib files) which are written to
disk long enough to be dlopened before being unlinked.
 """ - https://pantsbuild.github.io/pex_design.html#pex-__main__py

find_eggs_in_zip and find_wheels_in_zip in
https://github.com/pantsbuild/pex/blob/master/pex/finders.py
may be helpful for building a recursive zip importer
(Which may or may not reduce total pypi bandwidth because DEFLATE)


> 2. Sign the uncompressed data
>
> Check hashes and signatures against the .tar file instead of .tar.gz when
> doing pip install ... #sha256=nnn. For zip, check against a hash of all the
> hashes of the uncompressed members.
>
> 3. Go crazy
>
> pypi is now free to re-compress without additional input from the
> publisher. Both .gz and .lzma versions etc. could be offered.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20160822/fba822c7/attachment.html>


More information about the Distutils-SIG mailing list