[Distutils] PEP 527 - Removing Un(der)used file types/extensions on PyPI

tritium-list at sdamon.com tritium-list at sdamon.com
Tue Aug 23 17:00:35 EDT 2016


I am gung ho on everything in this pep except the sdist format section.

Yes, tar, tar.bz2, tar.xz, tar.Z, .tgz, and tbz can probably go without much contest.  I agree with some of the arguments in the inspiring thread that brought this up, though, that if there is one, and only one, sdist format, it should indeed be .zip, NOT .tar.gz.  If dropping tar.gz is too big of a pill to swallow, then both tar.gz and zip should survive, with an encouragement to use zip in the future.

> 
> File Extensions
> ---------------
> 
> Currently ``sdist`` supports a wide variety of file extensions like `.tar.gz``,
> ``.tar``, ``.tar.bz2``, ``.tar.xz``, ``.zip``, ``.tar.Z``, ``.tgz``, and
> ``.tbz``. However, of those the only extensions which get anything more than
> negligable usage is ``.tar.gz`` with 444,338 sdists currently, ``.zip`` with
> 58,774 sdists currently, and ``.tar.bz2`` with 3,265 sdists currently.
> 
> Having multiple formats accepted requires tooling both within PyPI and
> outside
> of PyPI to handle all of the various extensions that *might* be used (even if
> nobody is currently using them). This doesn't only affect PyPI, but ripples out
> throughout the ecosystem. In addition, the different formats all have
> different
> requirements for what optional C libraries Python was linked against and
> different requirements for what versions of Python they support. In
> addition,
> multiple formats also create a weird situation where there may be two
> ``sdist`` files for a particular project/release with subtly different content.
> 
> It's easy to advocate that anything outside of ``.tar.gz``, ``.zip``, and
> ``.tar.bz2`` should be disallowed. Outside of a tiny handful, nobody has
> actively been uploading these other types of files in the ~15 years of PyPI's
> existence so they've obviously not been particularly useful. In addition, while
> ``.tar.xz`` is theoretically a nicer format than the other ``.tar.*`` formats
> due to the better compression ratio achieved by LZMA, it is only available in
> Python 3.3+ and has an optional dependency on the lzma C library.
> 
> Looking at the three extensions we *do* have in current use, it's also fairly
> easy to conclude that ``.tar.bz2`` can be disallowed as well. It has a fairly
> small number of files ever uploaded with it and it requires an additional
> optional C library to handle the bzip2 compression.
> 
> Finally we get down to ``.tar.gz`` and ``.zip``. Looking at the pure numbers
> for these two, we can see that ``.tar.gz`` is by far the most uploaded format,
> with 444,338 total uploaded compared to ``.zip``'s 58,774 and on POSIX
> operating systems ``.tar.gz`` is also the default produced by all currently
> released versions of Python and setuptools. In addition, these two file types
> both use the same C library (``zlib``) which is also required for
> ``bdist_wheel`` and ``bdist_egg``. The two wrinkles with deciding between
> ``.tar.gz`` and ``.zip`` is that while on POSIX operating systems ``.tar.gz``
> is the default, on Windows ``.zip`` is the default and the ``bdist_wheel``
> format also uses zip.
> 
> This PEP proposes that we drop the use of ``.zip`` extensions for sdists on
> PyPI and standardize around ``.tar.gz``. For both extensions there are going
> to
> be automation designed by end users which are making assumptions about
> what the
> file extension produced by the ``sdist`` command will be. Changing either
> default will break some number of those, so by changing the default of
> ``.zip``
> to ``.tar.gz`` we minimize the amount of breakage by taking the smaller
> number
> of users and making them match the larger number. In addition, it's more
> likely
> to see Windows users upgrade their setuptools and Python releases on a
> faster
> timescale than POSIX users. POSIX users often get their Python and
> setuptools
> from their OS vendor and are discouraged or actively prevented from
> upgrading
> them outside of complete OS upgrades while Windows users *must* install
> Python
> and setuptools on their own, and thus are more able to upgrade those pieces
> without triggering a complete OS upgrade.
> 
> While it is true that switching to ``.zip`` would align ``sdist`` with
> ``bdist_wheel`` in terms of format, this is not a very large benefit because
> both formats are able to be manipulated with the Python standard library
> just
> as easily and both require the same C library (``zlib``). It is also true that
> Windows has support for ``.zip`` files out of the box but requires third party
> software for ``.tar.gz``, however only 0.6% of downloads for sdists on PyPI
> are
> initiated by browsers and we can assume that only a fraction of those 0.6%
> are
> Windows users who want to manually extract the file and do not have a
> means of
> extracting a ``.tar.gz``, particularly since Python itself can be used to
> extract a ``.tar.gz`` via the command line since version 3.4. In addition, the
> use of ``.tar.gz`` will result in smaller sdists which will reduce the amount
> of bandwidth and disk space consumed by ``sdist`` files.
> 
> 



More information about the Distutils-SIG mailing list