[Distutils] PyPI is a sick sick hoarder

Donald Stufft donald at stufft.io
Fri May 15 21:19:37 CEST 2015


> On May 15, 2015, at 2:57 PM, Robert Collins <robertc at robertcollins.net> wrote:
> 
> So, I am working on pip issue 988: pip doesn't resolve packages at all.
> 
> This is O(packages^alternatives_per_package): if you are resolving 10
> packages with 10 versions each, there are approximately 10^10 or 10G
> combinations. 10 packages with 100 versions each - 10^100.
> 
> So - its going to depend pretty heavily on some good heuristics in
> whatever final algorithm makes its way in, but the problem is
> exacerbated by PyPI's nature.
> 
> Most Linux (all that i'm aware of) distributions have at most 5
> versions of a package to consider at any time - installed(might be
> None), current release, current release security updates, new release
> being upgraded to, new release being upgraded to's security updates.
> And their common worst case is actually 2 versions: installed==current
> release and one new release present. They map alternatives out into
> separate packages (e.g. when an older soname is deliberately kept
> across an ABI incompatibility, you end up with 2 packages, not 2
> versions of one package). To when comparing pip's challenge to apt's:
> apt has ~20-30K packages, with altnernatives ~= 2, or
> pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
> 
> Scaling the number of packages is relatively easy; scaling the number
> of alternatives is harder. Even 300 packages (the dependency tree for
> openstack) is ~2.4T combinations to probe.
> 
> I wonder if it makes sense to give some back-pressure to people, or at
> the very least encourage them to remove distributions that:
> - they don't support anymore
> - have security holes
> 
> If folk consider PyPI a sort of historical archive then perhaps we
> could have a feature to select 'supported' versions by the author, and
> allow a query parameter to ask for all the versions.
> 

There have been a handful of projects which would only keep the latest N
versions uploaded to PyPI. I know this primarily because it has caused
people a decent amount of pain over time. It’s common for deployments people
have to use a requirements.txt file like ``foo==1.0`` and to just continue
to pull from PyPI. Deleting the old files breaks anyone doing that, so it would
require either having people bundle their deps in their repositories or
some way to get at those old versions. Personally I think that we shouldn’t
go deleting the old versions or encouraging people to do that.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150515/6f6527cd/attachment.sig>


More information about the Distutils-SIG mailing list