<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 26, 2015 at 11:41 PM, Nathaniel Smith <span dir="ltr"><<a href="mailto:njs@pobox.com" target="_blank">njs@pobox.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5">On Mon, Oct 26, 2015 at 4:41 AM, Donald Stufft <<a href="mailto:donald@stufft.io">donald@stufft.io</a>> wrote:<br>

> On October 26, 2015 at 3:36:47 AM, Nathaniel Smith (<a href="mailto:njs@pobox.com">njs@pobox.com</a>) wrote:<br>

>> > TL;DR<br>

>> -----<br>

>><br>

>> If we:<br>

>><br>

>> - implement a real resolver, and<br>

>> - add a notion of a per-project namespace of distribution names,<br>

>> that<br>

>> are collected under the same PyPI registration and come from<br>

>> the same<br>

>> sdist, and<br>

>> - add Conflicts:, and Provides:,<br>

>><br>

>> then we can elegantly solve a collection of important and difficult<br>

>> problems, and we can retroactively pun the old extras system<br>

>> onto the<br>

>> new system in a way that preserves 100% compatibility with all<br>

>> existing packages.<br>

>><br>

>> I think?<br>

>><br>

>> What do you think?<br>

><br>

> My initial reaction when I started reading your idea was that I didn't see a<br>

> point in having something like foo[bar] be a "real" package when you could just<br>

> as easily have foo-bar. However, as I continued to read through the idea it<br>

> started to grow on me. I think I need to let it percolate in my brain a little<br>

> bit, but there may be a non-crazy (or at least, crazy in a good way) idea here<br>

> that could push things forward in a nice way.<br>

<br>

</div></div>Oh good, at least I'm not the only one :-).<br>

<br>

I'd particularly like to hear Robert's thoughts when he has time,<br>

since the details depend strongly on some assumptions about how a real<br>

resolver would work.<br>

<span class=""><br>

> Some random thoughts:<br>

><br>

> * Reusing the extra syntax is nice because it doesn't require end users to<br>

>   learn any new concepts, however we shouldn't take a new syntax off the table<br>

>   either if it makes the feature easier to implement with regards to backwards<br>

>   compatability. Something like numpy{mkl,some-other-thing} could work just as<br>

>   well too. We'll need to make sure that whatever symbols we choose can be<br>

>   represented on all the major FS we care about and that they are ideally non<br>

>   ugly in an URL too. Of course, the filename and user interface symbols don't<br>

>   *need* to match. It could just as easily example numpy[mkl] out to numpy#mkl<br>

>   or whatever which should make it easier to come up with a nice scheme.<br>

<br>

</span>Right -- obviously it would be *nice* to keep the number of concepts<br>

down, and to avoid skew between filenames and user interface (because<br>

users do see filenames), but if these turn out to be impossible then<br>

there are still options that would let us save the<br>

per-project-package-namespace idea.<br>

<span class=""><br>

> * Provides is a bit of an odd duck, I think in my head I've mostly come to<br>

>   terms with allowing unrestricted Provides when you've already installed the<br>

>   package doing the Providing but completely ignoring the field when pulling<br>

>   data from a repository. Our threat model assumes that once you've selected to<br>

>   install something then it's generally safe to trust (though we still do try<br>

>   to limit that). The problem with Provides mostly comes into play when you<br>

>   will respect the Provides: field for any random package on PyPI (or any other<br>

>   repo).<br>

<br>

</span>Yeah, I'm actually not too worried about malicious use either in<br>

practice, for the reason you say. But even so I can think of two good<br>

reasons we might want to be careful about stating exactly when<br>

"Provides:" can be trusted:<br>

<br>

1) if you have neither scipy nor numpy installed, and you do 'pip<br>

install scipy', and scipy depends on the pure virtual package<br>

'numpy[abi-2]' which is only available as a Provides: on the concrete<br>

package 'numpy', then in this case the resolver has to take Provides:<br>

into account when pulling data from the repo -- if it doesn't, then<br>

it'll ignore the Provides: on 'numpy' and say that scipy's<br>

dependencies can't be satisfied. So for this use case to work, we<br>

actually do need to be able to sometimes trust Provides: fields.<br>

<br>

2) the basic idea of a resolver is that it considers a whole bunch of<br>

possible configurations for your environment, and picks the<br>

configuration that seems best. But if we pay attention to different<br>

metadata when installing as compared to after installation, then this<br>

skew makes it possible for the algorithm to pick a configuration that<br>

looks good a priori but is broken after installation. E.g. for a<br>

simple case:<br>

<br>

  Name: a<br>

  Conflicts: some-virtual-package<br>

<br>

  Name: b<br>

  Provides: some-virtual-package<br>

<br>

'pip install a b' will work, because the resolver ignores the<br>

Provides: and treats the packages as non-conflicting -- but then once<br>

installed we have a broken system. This is obviously an artificial<br>

example, but creating the possibility of such messes just seems like<br>

the kind of headache we don't need. So I think whatever we do with<br>

Provides:, we should do the same thing both before and after<br>

installation.<br></blockquote><div><br></div><div>Another simple solution for this particular case is to add conflict rules between packages that provide the same requirement (that's what php's composer do IIRC).</div><div><br></div><div>The case of safety against malicious forks is handled quite explicitly in composer, we may want to look at how they do it when considering solutions (e.g. <a href="https://github.com/composer/composer/issues/2690">https://github.com/composer/composer/issues/2690</a>, though it has changed a bit since then)</div><div><br></div><div>Adding the provides/conflict concepts to pip resolver will complexify it quite significantly, both in terms of running time complexity (since at that point you are solving a NP-complete problem) and in terms of implementation. But we also know for real cases this is doable, even in pure python (composer handles all the cases you are mentioning, and is in pure php).</div><div><br></div><div>David</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

A simple safe rule is to say that Provides: is always legal iff a<br>

package's Name: and Provides: have a matching BASE, and always illegal<br>

otherwise, making the package just invalid, like if the METADATA were<br>

written in Shift-JIS or something. This rule is trivial to statically<br>

check/enforce, and could always be relaxed more later.<br>

<span class=""><br>

> * The upgrade mess around extras as they stand today could also be solved just<br>

>   by recording what extras (if any) were selected to be installed so that we<br>

>   keep a consistent view of the world. Your proposal is essentially doing that,<br>

>   just by (ab)using the fact that by installing a package we essentially get<br>

>   that aspect of it for "free".<br>

<br>

</span>Right -- you certainly could implement a database of installed extras<br>

to go alongside the database of installed packages, but it seems like<br>

it just makes things more complicated with minimal benefit. E.g., you<br>

have to add special case code to the resolver to check both databases,<br>

and then you have to add more special case code to 'pip freeze' to<br>

make sure *it* checks both databases... this kind of stuff adds up.<br>

<span class=""><br>

> * Would this help at all with differentiating between SSE2 and SSE3 builds and<br>

>   things like that? Or does that need something more automatic to be really<br>

>   usable?<br>

<br>

</span>I'm not convinced that SSE2 versus SSE3 is really worth trying to<br>

handle automatically, just because we have more urgent issues and<br>

everyone else in the world seems to get by okay without special<br>

support for this in their package system (even if it's not always<br>

optimal). But if we did want to do this then my intuition is that it'd<br>

be more elegant to do it via the wheel platform/architecture field,<br>

since this actually is a difference in architectures? So you could<br>

have one wheel for the "win32" platform and another wheel for the<br>

"win32sse3" platform, and the code in the installer that figures out<br>

which wheels are compatible would know that both of these are<br>

compatible with the machine it was running on (or not), and that<br>

win32sse3 is preferable to plain win32.<br>

<span class=""><br>

> * PEP 426 (I think it was?) has some extra syntax for extras which could<br>

>   probably be really nice here, things like numpy[*] to get *all* of the extras<br>

>   (though if they are real packages, what even is "all"?). It also included<br>

>   (though this might have been only in my head) default to installed packages<br>

>   which meant you could do something like split numpy into numpy[abi2] and<br>

>   numpy[abi3] packages and have the different ABIs actually contained within<br>

>   those other packages. Then you could have your top level package default to<br>

>   installing abi3 and abi2 so that ``pip install numpy`` is equivilant to<br>

>   ``pip install numpy[abi2,abi3]``. The real power there, is that people can<br>

>   trim down their install a bit by then doing ``pip install numpy[-abi2]`` if<br>

>   they don't want to have that on-by-default feature.<br>

<br>

</span>Hmm, right, I'm not thinking of a way to *quite* duplicate this.<br>

<br>

One option would be to have a numpy[all] package that just depends on<br>

all the other extras packages -- for the traditional 'extra' cases<br>

this could be autogenerated by setuptools at build time and then be a<br>

regular package after that, and for next-generation build systems that<br>

had first-class support for these [] packages, it would be up to the<br>

build system / project whether to generate such an [all] package and<br>

what to include in it if they did. But that doesn't give you the<br>

special all-except-for-one behavior.<br>

<br>

The other option that jumps to mind is what Debian calls "recommends",<br>

which act like a soft-dependency: in debian, if numpy recommends:<br>

numpy[abi-2] and numpy[abi-3], then 'apt-get install numpy' would give<br>

you all three of them by default, just like if numpy required them --<br>

but for recommends: you can also say something like 'apt-get install<br>

numpy -numpy[abi-3]' if you want numpy without the abi-3 package, or<br>

'apt-get install --no-recommends numpy' if you want a fully minimal<br>

install, and this is okay because these are only *recommendations*,<br>

not an actual requirements. I don't see any fundamental reasons why we<br>

couldn't add something like this to pip, though it's probably not that<br>

urgent.<br>

<br>

My guess is that these two solutions together would pretty much cover<br>

the relevant use cases?<br>

<span class="im"><br>

-n<br>

<br>

--<br>

Nathaniel J. Smith -- <a href="http://vorpus.org" rel="noreferrer" target="_blank">http://vorpus.org</a><br>

</span><div class=""><div class="h5">_______________________________________________<br>

Distutils-SIG maillist  -  <a href="mailto:Distutils-SIG@python.org">Distutils-SIG@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/distutils-sig" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/distutils-sig</a><br>

</div></div></blockquote></div><br></div></div>