[Distutils] Deprecating little used file types/extensions on PyPI?

Donald Stufft donald at stufft.io
Mon Aug 15 15:09:03 EDT 2016


I'd like to restrict what folks can upload to PyPI in an effort to help narrow
the scope down and to enable more a more consistent experience for everyone.

First off, we currently allow people to upload sdist, bdist_wheel, bdist_egg,
bdist_dmg, bdist_dumb, bdist_msi, bdist_rpm, and bdist_wininst. However I think
that we should try to get rid of support for most of these. Just for reference
currently the number of files uploaded for each type of file looks like:

* sdist: 506,585
* bdist_wheel: 81,207
* bdist_egg: 48,282
* bdist_wininst: 14,002
* bdist_dumb: 5,502
* bdist_msi: 497
* bdist_rpm: 464
* bdist_dmg: 45

Out of all of these, I think that we can easily remove bdist_dmg, bdist_rpm,
and bdist_dumb. I also believe that there is a strong case for removing
bdist_msi and bdist_wininst. I also think we should consider removing

First of all, when I say "remove", I mean disallow new uploads, but do not
delete the existing files.

Looking at each file type:

I think that bdist_dumb is a pretty easy one to remove. It's format is such
that it's basically not a very useful format to begin with. It takes the full
path to the files and stores them in the repository. So If I install something
to /opt/mycoolproject/lib/python3.5/site-packages/froblib/ then it will have
paths that look like opt/mycoolproject/lib/python3.5/site-packages/froblib/...
I think this is obviously not very useful and not many people have uploaded any
bdist_dumb files at all. They are also problematic because they have the same
file extension as sdists, so pip doesn't really have a great way to
differentiate between bdist_dumbs and sdists except by trying to guess if it
contains one of distutils's adhoc platform tags.

Looking at bdist_rpm, practically nobody has ever used it with a total of 45
files ever uploaded for it. It's not super useful to be able to upload rpms to
PyPI since it's not an RPM repository so people have to manually download them
and then install them manually. It's also a bit weird to have support for RPMs
but not for all of the other package formats that people might want.

Next we have bdist_dmg, bdist_msi, and bdist_winist. I'm lumping these together
because they're all OS specific installers for OSs that don't already have some
sort of repository. This lack of a repository format for them means that random
downloads are already the norm for people using these systems. For these, I
think the usage numbers for bdist_dmg and bdist_msi easily suggest that they
are not very important to continue to support, but it would be weird to
eliminate them without also elminating bdist_wininst. The wininst format has
the most usage out of all of the seldom used formats, however when we look at
the downloads for the last 30 days only 0.42% of the downloads were for wininst
files, so I think that it's pretty safe to remove them. I think in the past,
these were more important due to the lack of real binary packages on Windows,
but I think in 2016 we have wheel, and Wheel is a better solution. If however
we want to keep them, then I think it's pretty safe to remove them from our
/simple/ pages and any future repository pages and modify our mirroring tooling
to stop mirroring them. IOW, to treat them as some sort of "additional upload"
rather than release uploads to PyPI.

Finally, bdist_egg is quite possibly the trickiest one to justify. A fair
number of people still upload eggs, even though we have the wheel format.
However, I think that we should (and generally do) consider eggs to be
deprecated and while we don't want to break existing packages by removing them,
we should block further uploads for them. Looking again at the download numbers
eggs represented only 1.8% of total downloads in the last 30 days while wheels
represented 41% and sdists represented 56%. Further more, I think we should do
this with the expectation that any new repository API will no longer include
egg files in them, and will just be sdists and wheels.

Doing all of this would leave us with:

* The /simple/ repository only having sdists, wheels, and eggs and disallowing
  new eggs to be uploaded.
* The hypothetical repository 2.0 api only having sdists and wheels.
* *MAYBE* Having "related files" that people could upload things like
  Windows/OSX Installers.

On a related but different note, I'd also like to restrict the acceptable
extensions for sdists. Currently distutils allows people to generate .tar,
.tar.gz, .tgz, .tar.bz2, .tbz, .zip, .tar.xz, .tar.Z and possibly others. This
is a bit problematic because each of those types requires a different set of
optional libraries (which may or may not exist depending on Python version) so
you can't really use a lot of them (for example, while .tar.xz might give you
better compression, it's also not usable before Python 3).

Looking at numbers again, the current number of projects for each file ext are:

* .tar.gz: 444,338
* .zip: 58,774
* .tar.bz2: 3,265
* .tgz: 217
* Everything Else: 0

These results are not particularly surprising since .tar.gz is the default
format that distutils creates. What I would like to do is, similarly to above,
simply stop allowing new uploads for anything but .tar.gz. This will have a few
positive effects:

* It will make it so that (going forward) there is only a single one of the
  optional C libraries that Python can be compiled against that is required for
  someone to install something from PyPI using pip. This would be the zlib
  library which is available almost universally.

* It will reduce confusion because people will not be able to upload two
  different source releases for the same version (e.g. example-1.0.tar.gz and

* It will making tooling around PyPI easier, because it'll only have to deal
  with one file extension for sdists.


Donald Stufft

More information about the Distutils-SIG mailing list