[Distutils] Deprecating little used file types/extensions on PyPI?
graffatcolmingov at gmail.com
Mon Aug 15 15:22:51 EDT 2016
On Mon, Aug 15, 2016 at 2:09 PM, Donald Stufft <donald at stufft.io> wrote:
> I'd like to restrict what folks can upload to PyPI in an effort to help narrow
> the scope down and to enable more a more consistent experience for everyone.
> First off, we currently allow people to upload sdist, bdist_wheel, bdist_egg,
> bdist_dmg, bdist_dumb, bdist_msi, bdist_rpm, and bdist_wininst. However I think
> that we should try to get rid of support for most of these. Just for reference
> currently the number of files uploaded for each type of file looks like:
> * sdist: 506,585
> * bdist_wheel: 81,207
> * bdist_egg: 48,282
> * bdist_wininst: 14,002
> * bdist_dumb: 5,502
> * bdist_msi: 497
> * bdist_rpm: 464
> * bdist_dmg: 45
> Out of all of these, I think that we can easily remove bdist_dmg, bdist_rpm,
> and bdist_dumb. I also believe that there is a strong case for removing
> bdist_msi and bdist_wininst. I also think we should consider removing
> First of all, when I say "remove", I mean disallow new uploads, but do not
> delete the existing files.
> Looking at each file type:
> I think that bdist_dumb is a pretty easy one to remove. It's format is such
> that it's basically not a very useful format to begin with. It takes the full
> path to the files and stores them in the repository. So If I install something
> to /opt/mycoolproject/lib/python3.5/site-packages/froblib/ then it will have
> paths that look like opt/mycoolproject/lib/python3.5/site-packages/froblib/...
> I think this is obviously not very useful and not many people have uploaded any
> bdist_dumb files at all. They are also problematic because they have the same
> file extension as sdists, so pip doesn't really have a great way to
> differentiate between bdist_dumbs and sdists except by trying to guess if it
> contains one of distutils's adhoc platform tags.
> Looking at bdist_rpm, practically nobody has ever used it with a total of 45
> files ever uploaded for it. It's not super useful to be able to upload rpms to
> PyPI since it's not an RPM repository so people have to manually download them
> and then install them manually. It's also a bit weird to have support for RPMs
> but not for all of the other package formats that people might want.
> Next we have bdist_dmg, bdist_msi, and bdist_winist. I'm lumping these together
> because they're all OS specific installers for OSs that don't already have some
> sort of repository. This lack of a repository format for them means that random
> downloads are already the norm for people using these systems. For these, I
> think the usage numbers for bdist_dmg and bdist_msi easily suggest that they
> are not very important to continue to support, but it would be weird to
> eliminate them without also elminating bdist_wininst. The wininst format has
> the most usage out of all of the seldom used formats, however when we look at
> the downloads for the last 30 days only 0.42% of the downloads were for wininst
> files, so I think that it's pretty safe to remove them. I think in the past,
> these were more important due to the lack of real binary packages on Windows,
> but I think in 2016 we have wheel, and Wheel is a better solution. If however
> we want to keep them, then I think it's pretty safe to remove them from our
> /simple/ pages and any future repository pages and modify our mirroring tooling
> to stop mirroring them. IOW, to treat them as some sort of "additional upload"
> rather than release uploads to PyPI.
> Finally, bdist_egg is quite possibly the trickiest one to justify. A fair
> number of people still upload eggs, even though we have the wheel format.
> However, I think that we should (and generally do) consider eggs to be
> deprecated and while we don't want to break existing packages by removing them,
> we should block further uploads for them. Looking again at the download numbers
> eggs represented only 1.8% of total downloads in the last 30 days while wheels
> represented 41% and sdists represented 56%. Further more, I think we should do
> this with the expectation that any new repository API will no longer include
> egg files in them, and will just be sdists and wheels.
> Doing all of this would leave us with:
> * The /simple/ repository only having sdists, wheels, and eggs and disallowing
> new eggs to be uploaded.
> * The hypothetical repository 2.0 api only having sdists and wheels.
> * *MAYBE* Having "related files" that people could upload things like
> Windows/OSX Installers.
> On a related but different note, I'd also like to restrict the acceptable
> extensions for sdists. Currently distutils allows people to generate .tar,
> .tar.gz, .tgz, .tar.bz2, .tbz, .zip, .tar.xz, .tar.Z and possibly others. This
> is a bit problematic because each of those types requires a different set of
> optional libraries (which may or may not exist depending on Python version) so
> you can't really use a lot of them (for example, while .tar.xz might give you
> better compression, it's also not usable before Python 3).
> Looking at numbers again, the current number of projects for each file ext are:
> * .tar.gz: 444,338
> * .zip: 58,774
> * .tar.bz2: 3,265
> * .tgz: 217
> * Everything Else: 0
> These results are not particularly surprising since .tar.gz is the default
> format that distutils creates. What I would like to do is, similarly to above,
> simply stop allowing new uploads for anything but .tar.gz. This will have a few
> positive effects:
> * It will make it so that (going forward) there is only a single one of the
> optional C libraries that Python can be compiled against that is required for
> someone to install something from PyPI using pip. This would be the zlib
> library which is available almost universally.
> * It will reduce confusion because people will not be able to upload two
> different source releases for the same version (e.g. example-1.0.tar.gz and
> * It will making tooling around PyPI easier, because it'll only have to deal
> with one file extension for sdists.
My only thought is how we convey this message to users. I wonder if it
would be beneficial to have Twine cut a release that warns users when
they are uploading something that will be unsupported, then have
Warehouse/PyPI start returning a 415 (Unsupported media type)
approximately a few weeks/month later.
I'm +1 for restricting the kinds of things people can upload though.
More information about the Distutils-SIG