[Distutils] Second draft of a plan for a new source tree / sdist format

Nathaniel Smith njs at pobox.com
Wed Oct 28 05:18:23 EDT 2015

On Tue, Oct 27, 2015 at 7:00 AM, David Cournapeau <cournape at gmail.com> wrote:
> On Tue, Oct 27, 2015 at 1:12 PM, Daniel Holth <dholth at gmail.com> wrote:
>> The drawback of .zip is file size since it compresses each file
>> individually rather than giving the compression algorithm a larger input,
>> it's a great format otherwise. Ubiquitous including Apple iOS packages,
>> Java, word processor file formats. And most Python packages are small.
> I don't really buy the indexing advantages, especially w/ the current
> implementation of zipfile in python (e.g. loading the whole set of archives
> at creation time)

Can you elaborate about what you mean? AFAICT from a quick skim of the
source code, zipfile does eagerly read in the table of contents for
the zip file (i.e., it reads out the list of files and their
metadata), but no actual files are decompressed until you ask for them
individually, and when you do request a specific file then it can be
accessed in O(1) time. This is really different from .tar.gz, where
you have to decompress the entire archive just to get a list of files,
and then you need to decompress the whole thing again each time you
want to access a single file inside.

(Regarding the size thing, yeah, .tar.gz is smaller, and .tar.bz2
smaller than that, and .tar.xz smaller again, ... but this doesn't
strike me as an argument for throwing up our hands and leaving the
choice to individual projects, because it's not like they know what
the optimal trade-off is either. IMO we should pick one, and zip is
Good Enough.)


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list