On Jun 6, 2014, at 7:46 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:


On 7 Jun 2014 06:08, "Donald Stufft" <donald@stufft.io> wrote:
>
>
> On Jun 6, 2014, at 9:41 AM, holger krekel <holger@merlinux.eu> wrote:
> >
> > Once you care for ACLs for indexes and releases you have a number
> > of issues to consider, it's hardly related to PEP470/PEP438.
>
> It is related, because it means that the exact same mechanisms can be used,
> people don’t have to learn two different ways of specifying externally hosted
> projects. In fact it also teaches them how to specify mirrors and the like as well
> something that any devpi user is already going to have to learn how to do.

This is the key benefit of PEP 470 from my perspective: some aspects of the Python packaging ecosystem suffer from a bad case of "too many ways to do it", and if we're ever going to fix that, we need to be ruthless in culling redundant concepts.

Specifying custom indexes is a feature with a lot of use cases - local mirrors and private indexes being two of the big ones. By contrast, external references from the simple API duplicate a small subset of the custom index functionality in a way that introduces a whole slew of new concepts that still need to be documented and learned, even if the advice is "don't use that, use custom indexes instead".

As far as dev-pi goes, if it's only mirroring links rather than externally hosted files today, then in the future, it will still automatically mirror the external index URLs. Dependency update scanners could follow those links automatically, even if pip install doesn't check them by default.

One other nice consequence of PEP 470 should make it easier for organisations to flag and investigate cases where they're relying on an upstream source other than PyPI, regardless of whether they care about the details of their dependencies' hosting for speed, reliability or legal reasons.

From a migration perspective, how hard would it be to automate generation of a custom index page on pythonhosted.org for projects currently relying on external references? That would still let us make the client changes without needing to special case PIL.



Not very difficult. My current crawl script could generate a minimal one with some minor modifications (it’d have to save the whole URL instead of just the filename) and would take about 3 hours to process. This process would also weed out links which have died and the like. Downside would be these files wouldn’t be verifies so it would be external + unsafe index since we don’t have hash information to make them safe. Of course this would be the case for PIL anyways which easily makes up most of this traffic so this could just end up in the wash as far as how “safe” it is.

Also, it occurred to me that while the latest/any split matters for new users, we still need to consider the impact on projects which have pinned dependencies on older versions of packages that were previously externally hosted, but have moved to PyPI for more recent releases. I still think dropping the external reference feature from the simple API in favour of improving the custom index support is the right to do, but a couple of *client side* examples of handling the migration could help clarify the consequences for the existing users that may be affected.



Right, this was one of the reasons my old numbers had a split at 50%, part of the idea was that a project with less than some percent of it’s files hosted on PyPI had a smaller “breakage” surface, even for old pinned versions. I can get these numbers too again if they’d be useful, though I’m not sure if they should go in the PEP or not, it’s already kind of heavy on the numbers I think and I’m not sure additional numbers would be more or less confusing.

What do you mean by client side examples of handling the migration? I’m assuming you mean something other than the examples which show how to utilize the new indexes?

For example, perhaps we should keep "--allow-all-external", but have it mean that pip automatically adds new custom index URLs given for the requested packages. Even if it emitted a deprecation warning, clients using it would keep working in the face of the proposed changes to the simple API link handling.

Well it’d actually expand what —allow-all-external means, since it’d also allow those unsafely hosted files. I’m not sure it’d be a good idea to silently (or with a warning even) upgrade an option from a “do this to allow all safely hosted files” to a “do this to allow a whole bunch of legacy and unsafely hosted files”. The one upside to that is we’d direct link to files instead of relying on scraping so you’d have to actually rely on an unsafe file to be at risk, but it still makes me nervous.

It’s possible we could add a flag for this, but I’m not sure how useful it’d be since it’d only be in pip 1.6+ and unless people upgrade to that immediately they’ll have to specify the old —extra-index-url // —find-links method anyways.

I guess the question would be, how bad would it be to have pip install —extra-index-url https://legacy-unsafe.pythonhosted.org/ PIL vs pip install —allow-legacy-unsafe PIL.

Regards,
Nick.



-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA