On Jun 16, 2017, at 6:05 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Fri, Jun 16, 2017, at 10:48 PM, Nathaniel Smith wrote:
The messy complications come from prepare_wheel_metadata and get_prepare_wheel_input_files, which isn't surprising, since those are the two hooks where we're squinting into our crystal ball to try and guess what will be useful for software that doesn't exist yet, but will later, maybe, we hope.
I'm not exactly clear on what use cases the prepare_wheel_metadata hook satisfies - that was in the spec before I was particularly involved in it.
Basically it exists because when we’re resolving dependencies to decide what set of projects/versions to install, we need a way to take a sdist (because a wheel might not exist!) and decide what dependencies it needs. We ultimately might end up throwing this away because maybe it ends up conflicting and we can’t actually utilize this particular version at all. So we might end up cycling through 10 different versions of the same project looking for a version whose dependencies don’t conflict with the rest of what we’re trying to install. Without prepare_wheel_metadata our only real option is to just build the wheel and inspect that. For a tool like flit that is probably fine, because since it’s pure Python only building a wheel is going to be exceedingly quick. However something like Scikit or something where building the wheel might take tens of minutes or longer that can degrade things very quickly. If you imagine something that takes even 5 minutes to go from sdist to wheel, if we need to do that 10 times while looking for a version that we can use, that means resolution ends up taking over an hour just because of that package alone. This cost is tempered somewhat by the fact that in the ideal case we can cache those wheels so in the future resolution can be extremely quick, however not everyone can or will run with caching enabled and even then, ``pip install …`` taking an hour even just the first time is still a pretty horrible user experience. The goal of prepare_wheel_metadata then is basically allowing us to ask a sdist “what dependencies would you have, if we were to build you as a wheel” without having to go through the entire build process. This will ideally be much much faster. It’s not *just* dependencies either, in many cases we don’t know the name or version of something because we’re just given a fairly generic tarball (for instance `master.zip` from Github). We need a way to get name/version from that tarball, which would exist in the wheel but we end up falling into the same “but what if things take forever to build” problem. Another possible solution is to go down the path of trying to make a sdist 2.0 that has this metadata in a static format so we can promise that we can get it without needing to build anything. However not only is that a significant chunk of extra work, but some folks (Nathaniel I think?) has indicated that some projects simply can’t determine their dependencies statically at sdist build time, because it’s going to change at build time (I think the example was something can build against Numpy >=X, but once built against X+5, it has to have Numpy >=X.5 at runtime).
I do think we've hashed out a concrete need for prepare_build_files (or whatever it ends up called): to copy the files to a build directory without either copying large amounts of unnecessary data or figuring out what belongs in an sdist. The only alternative would be to require backends to avoid leaving clutter in the working directory, so they would take care of any necessary copy step rather than exposing a separate hook. People insist that trusting backends is a non-starter, though.
The other solution of course is to just say that all backends needs to be able to no-op copy a sdist from an existing sdist. So we have three options really: 1) Require backends to be able to no-op copy a sdist from an existing unpacked sdist. However Thomas is against this, as it would make flit’s job harder. 2) Require frontends to trust that backends are going to DTRT with regards to in-place builds and isolation. However myself and the other pip devs were against this, as we feel it is important for pip to do it’s job. 3) Add a hook that let’s the backends copy the files it needs into a staging area, without the need to prepare a full sdist. This is basically (1) except with some compromises to make the use cases that Thomas says makes flit’s job harder easier to deal with. Nathaniel is against (3) for simplicity sake, and personally I would prefer (1) because I think it is simpler and because I think that it shouldn’t be *too* much additional effort for a backend to make a no-op build_sdist in the case they’re being made out of something that is already a sdist. That being said, I am OK with (3) since Thomas believes that is better for flit than (1) and I don’t have any evidence to refute or back up that claim personally. — Donald Stufft