[Distutils] Deprecating little used file types/extensions on PyPI?

Donald Stufft donald at stufft.io
Mon Aug 15 17:39:10 EDT 2016


> On Aug 15, 2016, at 4:20 PM, Barry Warsaw <barry at python.org> wrote:
> 
> On Aug 15, 2016, at 03:09 PM, Donald Stufft wrote:
> 
>> Out of all of these, I think that we can easily remove bdist_dmg, bdist_rpm,
>> and bdist_dumb. I also believe that there is a strong case for removing
>> bdist_msi and bdist_wininst. I also think we should consider removing
>> bdist_egg.
> 
> +1 for all of these, or maybe with +0 for bdist_egg in the short term.  My
> only concern is that the deprecation period has been long enough, but I do
> agree that they should eventually be prohibited.


Yea, egg is sort of the stickiest thing here, but I also think it’s a good
idea, at least in the long term. We can play around with timescales as it
suits us of course.

>> On a related but different note, I'd also like to restrict the acceptable
>> extensions for sdists. Currently distutils allows people to generate .tar,
>> .tar.gz, .tgz, .tar.bz2, .tbz, .zip, .tar.xz, .tar.Z and possibly
>> others. This is a bit problematic because each of those types requires a
>> different set of optional libraries (which may or may not exist depending on
>> Python version) so you can't really use a lot of them (for example, while
>> .tar.xz might give you better compression, it's also not usable before Python
>> 3).
>> 
>> Looking at numbers again, the current number of projects for each file ext
>> are:
>> 
>> * .tar.gz: 444,338
>> * .zip: 58,774
>> * .tar.bz2: 3,265
>> * .tgz: 217
>> * Everything Else: 0
> 
> I'm a little less certain about this.  It's probably okay, although since the
> .tgz format is the same as .tar.gz, the tools argument doesn't hold for that
> extension.  I know compression nerds like to debate the merits of gz, bz2, and
> xz, but I doubt with the size of most of these files, it's going to make any
> difference.

Right, it’s a few reasons rolled up together:

* Single C library makes it easier to explain to people what they need capability
  wise to install a library (and the fact zlib is near universal doesn’t hurt).

* One extension (versus many) makes expectations easier, like your debian/watch
  file, but also things like ensuring a project has only *one* sdist per version
  instead of allowing 12.

* Variances breed bugs, particularly when you have disparate systems all working
  on this data. For example, your nose2 debian/watch file has a bug, it doesn’t
  support nose2-1.0.tar which is currently a supported by distutils/PyPI/pip. Of
  course the more variances you have the more likely you are to get bugs where
  different systems support different extensions, so narrowing it down to only
  .tar.gz and .tgz is pretty safe, but practically nobody uses .tgz anyways so
  why bother?

> 
> FWIW, restricting the extensions would simplify Debian's template for the
> PyPI debian/watch file (e.g. the pattern we use to download packages):
> 
> https://pypi.debian.net/nose2/nose2-(.+)\.(?:zip|tgz|tbz|txz|(?:tar\.(?:gz|bz2|xz)))
> 
> (Ignore the pypi.debian.net redirector.)
> 
> That would be a mild win.

Yea :D This is exactly the sort of thing that I think this makes better. There is
a lot of tooling built up around PyPI and the less flexible PyPI is the easier
those tools will have working with PyPI in terms of data. Of course, it is a
balancing act, but I think this is a pretty safe place to skew towards mandating
certain things.

—
Donald Stufft





More information about the Distutils-SIG mailing list