[Distutils] Immutable Files on PyPI

holger krekel holger at merlinux.eu
Mon Sep 29 11:04:55 CEST 2014


On Mon, Sep 29, 2014 at 10:46 +0200, M.-A. Lemburg wrote:
> On 28.09.2014 23:59, Donald Stufft wrote:
> > 
> >> On Sep 28, 2014, at 5:36 PM, M.-A. Lemburg <mal at egenix.com <mailto:mal at egenix.com>> wrote:
> >>
> >> On 28.09.2014 21:31, Donald Stufft wrote:
> >>> Hello All!
> >>>
> >>> I'd like to discuss the idea of moving PyPI to having immutable files. This
> >>> would mean that once you publish a particular file you can never reupload that
> >>> file again with different contents. This would still allow deleting the file or
> >>> reuploading it if the checksums match what was there prior.
> >>>
> >>> This would be good for a few reasons:
> >>>
> >>> * It represents "best practices" for version numbers. Ideally if two people
> >>>  have version "2.1" of a project, they'll have the same code, however as it
> >>>  stands two people installing at two different times could have two very
> >>>  different versions.
> >>>
> >>> * This will make improving the PyPI infrastructure easier, in particular it
> >>>  will make it simpler to move away from using a glusterfs storage array and
> >>>  switch to a redudant set of cloud object stores.
> >>>
> >>>
> >>> In the past this was brought up and a few points were brought against it, those
> >>> were:
> >>>
> >>> 1. That authors could simply change files that were hosted on not PyPI anyways
> >>>   so it didn't really do much.
> >>>
> >>> 2. That it was too hard to test a release prior to uploading it due to the
> >>>   nature of distutils requiring you to build the release in the same command
> >>>   as the upload.
> >>>
> >>> With the fact that pip no longer hits external URLs by default, I believe that
> >>> the first item is no longer that large of a factor. People can do whatever they
> >>> want on external URLs of course, however if something is coming from PyPI as
> >>> end users should now be aware of, they can know it is immutable.
> >>>
> >>> Now that there is twine, which allows uploading already created packages, I
> >>> also believe that the second item is no longer a concern. People can easily
> >>> create a distribution using ``setup.py sdist``, test it, and then upload that
> >>> exact thing they tested using ``twine upload <path to sdist>``.
> >>
> >> -1.
> >>
> >> It does happen that files need to be reuploaded because of a bug
> >> in the release process and how people manage their code is really
> >> *their* business, not that of PyPI.
> > 
> > Can you describe a reasonable hypothetical situation where this would occur
> > often enough as to be something that is likely to happen on a consistent
> > basis? Originally the problem was there was little ability to easily upload
> > pre-created files so there was a reasonable chance that there may be a
> > packaging bug that didn’t get exposed until you actually packaged + released.
> >
> > With the advent of twine though it’s now possible to test the exact bits that
> > get uploaded to PyPI making that particular issue no longer a problem.
> >
> > However, the fact that the files are not immutable *do* cause a number of
> > problems that need to be worked around in the mirroring infrastructure, the
> > CDN, and for scaling PyPI out and removing the glusterfs component.
> 
> You are missing out on cases, where the release process causes files to
> be omitted, human errors where packagers forget to apply changes to
> e.g. documentation files, version files, change logs, etc., where
> packagers want to add information that doesn't affect the software
> itself, but meta information included in the distribution files.

I've had such cases myself.  That's the only real cave-eat i see with the
proposal.  Then again, wheels don't allow uploading docs/changelogs today.
And pypi would continue to allow to change metadata [*].  I also see the
advantage of immutability of the (filename->content) relation so I
am +0 on the proposal currently.

> Such changes often do not affect the software itself, and so are not
> detected by software tests.
> 
> If I understand you correctly, you are essentially suggesting that it
> becomes impossible to ever delete anything uploaded to PyPI, i.e.
> turning PyPI into a WORM.

No, Donald said deleting would be fine.  But you couldn't then re-upload
to the same filename with a different checksum because pypi would memorize 
those properties.

> This would mean that package authors could never correct mistakes,
> remove broken packages distribution files, ones which they may be
> forced to remove for legal reasons, ones which they find are infected
> with a virus or trojan, ones which they uploaded for fun or
> by mistake.

In this case you would just delete the release under Donald's proposal.

best,
holger

[*] In some way, retro-actively changing the license in release metadata 
    is also questionable.  Maybe it should just be made clear that the 
    "license" pypi metadata is not reliable and one needs to check with
    the release file itself.  I've had a number of companies contact
    me over related licensing issues of my pypi published software.


> This doesn't have anything to do with making the user experience
> a better one. It is ignorant to assume that package authors who
> sometimes delete distribution files, or at least want to have the
> possibility to do so, don't care for their users. We are in
> Python land, so most authors will know what they are doing and
> do care for their users.
> 
> After all: Why do you think I'm arguing against this proposal ?
> Because I want users of our packages to get the best experience
> they can get, by downloading complete, correct and working
> distribution files.
> 
> This whole idea also has another angle, namely a legal one:
> the PSF doesn't own the distribution files it hosts on PyPI.
> 
> So far, the argument to not fix the much too broad license on PyPI
> was that authors were able to delete files on PyPI to work around
> the unneeded "irrevocable" part of that license.
> 
> With the suggested change, authors would have to give up complete
> control over their distribution files to the PSF in order for their
> packages to be installable by pip using its default settings.
> 
> This kind of lock-in and removal of author rights is not something
> I can support as PSF director. Those authors are the ones that have
> created a large part of our Python eco system and they are the ones that
> have put in work to get Python to where it is now: one of the best
> integrated programming languages you can find. We owe a lot to those
> authors and need to care for them.
> 
> Finally, changes such as the above will result in more authors
> to switch to alternative hosting platforms such as conda/binstar.org
> or plain github clone + setup.py install (which is becoming increasingly
> popular). Do you really believe that this will make the user experience
> a better one in the long run ?
> 
> If we want to make it attractive for package authors to host their
> packages on PyPI, we have to give them flexibility, respect their
> rights and be welcoming.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Sep 29 2014)
> >>> Python Projects, Consulting and Support ...   http://www.egenix.com/
> >>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 2014-09-30: Python Meeting Duesseldorf ...                      tomorrow
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig


More information about the Distutils-SIG mailing list