On Sep 28, 2014, at 5:36 PM, M.-A. Lemburg <mal@egenix.com> wrote:

On 28.09.2014 21:31, Donald Stufft wrote:
Hello All!

I'd like to discuss the idea of moving PyPI to having immutable files. This
would mean that once you publish a particular file you can never reupload that
file again with different contents. This would still allow deleting the file or
reuploading it if the checksums match what was there prior.

This would be good for a few reasons:

* It represents "best practices" for version numbers. Ideally if two people
 have version "2.1" of a project, they'll have the same code, however as it
 stands two people installing at two different times could have two very
 different versions.

* This will make improving the PyPI infrastructure easier, in particular it
 will make it simpler to move away from using a glusterfs storage array and
 switch to a redudant set of cloud object stores.


In the past this was brought up and a few points were brought against it, those
were:

1. That authors could simply change files that were hosted on not PyPI anyways
  so it didn't really do much.

2. That it was too hard to test a release prior to uploading it due to the
  nature of distutils requiring you to build the release in the same command
  as the upload.

With the fact that pip no longer hits external URLs by default, I believe that
the first item is no longer that large of a factor. People can do whatever they
want on external URLs of course, however if something is coming from PyPI as
end users should now be aware of, they can know it is immutable.

Now that there is twine, which allows uploading already created packages, I
also believe that the second item is no longer a concern. People can easily
create a distribution using ``setup.py sdist``, test it, and then upload that
exact thing they tested using ``twine upload <path to sdist>``.

-1.

It does happen that files need to be reuploaded because of a bug
in the release process and how people manage their code is really
*their* business, not that of PyPI.

Can you describe a reasonable hypothetical situation where this would occur
often enough as to be something that is likely to happen on a consistent
basis? Originally the problem was there was little ability to easily upload
pre-created files so there was a reasonable chance that there may be a
packaging bug that didn’t get exposed until you actually packaged + released.

With the advent of twine though it’s now possible to test the exact bits that
get uploaded to PyPI making that particular issue no longer a problem.

However, the fact that the files are not immutable *do* cause a number of
problems that need to be worked around in the mirroring infrastructure, the
CDN, and for scaling PyPI out and removing the glusterfs component.


FWIW, I am getting increasingly annoyed how PyPI and pip try to dictate
the way package authors are supposed to build, manage and host their
Python packages and release process. Can we please stop this ?

I recognize your annoyance, however I think that the changes that have been
made are overall good changes that negatively affect a minor subset of people
and positively affect a much wider group of people. Speaking as one of the
people who are pushing the hardest for the kinds of changes that I assume you’re
talking about, I do try and figure out ways to continue to enable the “alternative”
methods of doing things while still allowing forward progress on making things
better for the masses.

If there’s something I could have done more to ease that pain other than
*not* making changes at all then I would be gracious to hear them! I don’t want
to make these changes painful for people where that can be helped.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA