What metadata does pip actually need about sdists?
Hi all, I'm finding it impossible to keep track of that other thread, and I guess I'm probably not the only one, so I figured I'd try splitting a few of the more specific discussions out :-). One thing that seems to be a key issue, but where I remain very confused, is the question of what pip actually needs from an sdist. (Not PyPI, just pip or other package install tools.) Right now, IIUC, there are three times that pip install touches sdist-related metadata: 1) it uses the name+version that are embedded in the sdist filename to select an sdist from an index like PyPI 2) after unpacking this sdist it then calls 'setup.py egg_info' to get the full metadata for the wheel (or wheel equivalent) that this sdist will eventually produce. Specifically what it does with this is extract the setup_requires and install_requires fields, and uses them to go find other wheels/sdists that also need to be installed 3) eventually it actually builds the package, and this produces a wheel (or wheel equivalent) that has its own metadata (which often matches the metadata from egg_info in step (2), but not always) Is that a correct description of current behavior? Is there anything that pip ever looks at besides name, version, dependencies? Paul says that this is broken, and that pip gets lots of bug reports that "can be traced back to needing to run setup.py egg-info to get metadata" [1]. Since AFAICT the only metadata that pip actually touches is name, version, and dependencies, and it already knows the name and version before it runs egg_info, I assume that what this means is that it's crucial for pip to have static access to dependency information? OTOH in another email Paul says that name and version are the minimum he wants [2], so maybe I'm reading too much into this :-).
From the discussion so far, it sounds like the particularly crucial question is whether pip needs to statically know dependencies before building a wheel.
Trying to reason through from first principles, I can't see any reason why it would. It would be somewhat convenient if sdists did list their binary dependencies: if that were the case, then pip could take a strictly phased approach: 1) solve the complete dependency graph to find a set of packages to install / remove 2) for all the packages-to-be-installed that are sdists, turn them into wheels 3) install all the wheels OTOH if sdists have only name and version statically, but not dependency information, then you need to do something like: 1) create a fake dependency graph that contains accurate information for all known wheels, and for each sdist add a fake node that has the right name and version number but pretends not to have any dependencies. 2) solve this graph to find the set of packages to install / remove 3) if any of the packages-to-be-installed are sdists, then fetch them, run egg_info or build them or whatever to get their real dependencies, add these to the graph, and go to step 1 4) else, we have wheels for everything; install them. (This works because dependencies are constraints -- adding dependencies can only reduce the space of possible solutions, never enlarge it. Also, because by the time we decide to fetch and build any sdists, we already know that we're very likely to want to install them, so the performance penalty for building packages we turn out not to want is not high. And, crucially, we know that there exists some set of dependency metadata which would convince us to install these sdists, and dependency metadata is under the package author's control, so we already have established a trust route to the author of this package -- if they don't declare any dependencies, then we'll be installing and running arbitrary code of theirs, so running arbitrary code to check their dependencies doesn't require any additional trust.) But there's often a large difference between what we work out from first principles and how things actually work :-). Is there anything I'm missing in the analysis above? Do the relevant pip maintainers even read this mailing list? :-) -n [1] https://mail.python.org/pipermail/distutils-sig/2015-October/026960.html [2] https://mail.python.org/pipermail/distutils-sig/2015-October/026942.html -- Nathaniel J. Smith -- http://vorpus.org
On 11 October 2015 at 05:31, Nathaniel Smith <njs@pobox.com> wrote:
Do the relevant pip maintainers even read this mailing list? :-)
Donald and I are pip maintainers (not the only ones, but probably the ones who make the most noise on this list :-)). The comments I've made that you quoted are based mostly on how I recall pip works, it's unfortunately been some time since I looked into pip's dependency resolution and install code (it's a bit of a pit of snakes) and I haven't had time to check the details when commenting on this thread, so I don't claim my comments are authoritative. I think your description of the behaviour is right. As regards what pip could do, technically you are of course correct (it's possible, it just needs someone willing to make the code changes). I don't know, however, if the approach you're proposing fits with how we currently envisage pip developing. Specifically, my personal goal [1] is that we get to a point where pip can do all of the dependency resolution steps, getting to a point where it knows exactly what it will install, *before* it starts downloading packages, running build steps, etc. (And I don't see that sitting well with your proposal). The reason I want that is so that I would like to be able to see what pip plans on doing, and audit it before it begins (a "pip install --dry-run" type of step). Typically, this lets me avoid getting part-way through an install only to find that I need a project that doesn't have a wheel and I can't build from the sdist. That's a real problem I deal with a lot - at the moment I work with "pip list --outdated", check the listing to see what provides wheels, and update just those packages individually. But that's messy and manual, and not always 100% effective - and I'd like to be able to do better. I also don't think it's reasonable to focus solely on what *pip* requires. Just as important (IMO) is allowing PyPI to provide rich metadata. In particular, I write a lot of programs that query PyPI for package information. There are a lot of useful scripts I would like to be able to write that need to query PyPI for the dependency details of a package, and that simply isn't possible at the moment. Most of these are ad-hoc or personal-use scripts, and a lot of things I simply haven't done because I can't. So it's hard to quantify the benefits. But currently, the wonderful Metadata 2.0 ideal future promises me static dependency data on PyPI. That will only happen if we get enough people interested in implementing it, but I live in hope. If I understand your proposals, they are closing off that option, and explicitly stating that we'll never have dependency metadata on PyPI for projects that don't publish wheels. Maybe you're right, and it'll never be possible. But I don't want to give up that dream without being sure it's necessary - which is why I push back so hard on the idea. Paul [1] Which is not backed up by me having the time to write the code, so the only relevance here is that I'm inclined to support proposals that work towards that goal.
On 12 October 2015 at 00:49, Paul Moore <p.f.moore@gmail.com> wrote:
On 11 October 2015 at 05:31, Nathaniel Smith <njs@pobox.com> wrote:
Do the relevant pip maintainers even read this mailing list? :-)
Donald and I are pip maintainers (not the only ones, but probably the ones who make the most noise on this list :-)). The comments I've made that you quoted are based mostly on how I recall pip works, it's unfortunately been some time since I looked into pip's dependency resolution and install code (it's a bit of a pit of snakes) and I haven't had time to check the details when commenting on this thread, so I don't claim my comments are authoritative.
I think your description of the behaviour is right.
I have this on my to-read-threads list, just haven't had time to do it justice yet. FWIW I am *very* familiar with the dependency/install code at the moment, having overhauled it for: - wheel caching - working proof of concept static-dependencies-in-metadata handling - working proof of concept full resolver And various related bits.
...The reason I want that is so that I would like to be able to see what pip plans on doing, and audit it before it begins (a "pip install --dry-run" type of step). Typically, this lets me avoid getting part-way through an install only to find that I need a project that doesn't have a wheel and I can't build from the sdist. That's a real problem I deal with a lot - at the moment I work with "pip list --outdated", check the listing to see what provides wheels, and update just those packages individually. But that's messy and manual, and not always 100% effective - and I'd like to be able to do better.
'pip install --only-binary :all: THINGTOINSTALL' will do exactly what you want I think in that it will only download wheels. -Rob
On 12 October 2015 at 08:22, Paul Moore <p.f.moore@gmail.com> wrote:
On 11 October 2015 at 19:37, Robert Collins <robertc@robertcollins.net> wrote:
'pip install --only-binary :all: THINGTOINSTALL' will do exactly what you want I think in that it will only download wheels.
ooh, I didn't spot that getting added! Many thanks for that.
Paul
You're welcome :) -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud
On Sun, Oct 11, 2015 at 4:49 AM, Paul Moore <p.f.moore@gmail.com> wrote: [...]
As regards what pip could do, technically you are of course correct (it's possible, it just needs someone willing to make the code changes). I don't know, however, if the approach you're proposing fits with how we currently envisage pip developing. Specifically, my personal goal [1] is that we get to a point where pip can do all of the dependency resolution steps, getting to a point where it knows exactly what it will install, *before* it starts downloading packages, running build steps, etc.
Thanks for stating this so clearly. Unfortunately I just don't see any way this can possibly be achieved. It seems to unambiguously rule out the various "compromise" proposals from the other thread (e.g., your suggestion that packages would have to specify most dependencies statically, but would have the option of adding some extra ones at build time, would not accomplish the goal stated above). To accomplish this, it really is necessary that we be able to *exactly* predict the full dependency set (and also environment specifiers and potential external system requirements and ... -- anything that possibly affects whether a wheel is installable) before we download any packages or build any wheels. But as discussed at length in the other thread, it's a fact of life that the same source release may be configured in different ways that create different resulting dependencies. NumPy is one example of this, but it's hardly unusual -- pretty much any optional dependency on a C library works like this. And these kinds of issues will presumably only get more complicated if we start adding more complex dependency structures (e.g. "provides", per-package abi compatibility tracking, ...). Are you aware of any other systems that have accomplished anything like this? From a quick skim, it looks like .deb, .rpm, gentoo, freebsd ports, and ruby gems all allow for arbitrary code execution inside "source" packages when specifying dependencies, and none of the systems I looked at have the property that you're looking for. This doesn't prove that it's impossible, but... I do see one clear path to accomplish what you want: 1) enable linux wheels on pypi 2) build an autobuilder infrastructure for pypi 3) now that 95% of packages have wheels, flip the switch so that pip ignores sdists when auto-installing dependencies This strategy at least has the advantage that it only requires we do things that have been done before and we know are possible :-). And the alternative is -- what? As far as pip goes, it sounds like we all agree that there's a perfectly sound algorithm for solving the installation problem without access to static dependencies (i.e., wrapping the solve-then-build cycle in a loop); it's just that this leads to a bit more complicated code in pip. But pip clearly has to implement and debug this code no matter what, because we are committed to handling traditional sdists for some years yet. It seems like the best we can hope for is that if we impose these constraints on wheels, and we somehow manage to make this work at all (which seems to be a research problem), then we might eventually be able to replace some working code with some slightly simpler working code. (Plus whatever other benefits there are of having static dependency metadata on pypi, like interesting global static analyses of the package ecosystem.) I know "debian packagers think source wheels with static metadata would be great" was cited in the other thread as an example of an advantage, but note that for numpy, the dependencies in the configuration that debian will want to use are exactly *not* the dependencies in the configuration we'll put on pypi, because the pypi version wants to be self-contained inside the pypi ecosystem but the debian version wants to rely as much as possible on debian-distributed libraries. So even if numpy has a "source wheel" that has static dependency metadata, then this will be exactly what debian *can't* use; they'll need a traditional non-static-metadata source release instead. I dunno -- I'm sure there must exist some other ways forward that don't require dropping the dream of static dependencies. At one extreme, we had a birds-of-a-feature at SciPy this year on "the future of numpy", and the most vocal audience contingent was in favor of numpy simply dropping upstream support for pip/wheels/pypi entirely and requiring all downstream packages/users to switch to conda or building by hand. It sounds like a terrible idea to me. But I would find it easier to believe in the pypi/pip ecosystem if there were some concrete plan for how this all-static world was actually going to work, and that it wasn't just chasing rainbows. -n -- Nathaniel J. Smith -- http://vorpus.org
it's a fact of life that the same source release may be configured in different ways that create different resulting dependencies. NumPy is one example of this, but it's hardly unusual
I've tried to respond to this point twice.. but let me try again. : ) If I'm confused, please help me. aren't the "different resulting dependencies" non-python, non-pypi system dependencies? the dependencies the sdist would cover are the python/pypi dependencies, and they don't usually vary based on building extensions as for representing non-python distro dependencies and build variances, I mentioned before that PEP426 extensions might be used to cover that, but that's still an open question I think.. --Marcus
On Sun, Oct 11, 2015 at 10:12 PM, Marcus Smith <qwcode@gmail.com> wrote:
it's a fact of life that the same source release may be configured in different ways that create different resulting dependencies. NumPy is one example of this, but it's hardly unusual
I've tried to respond to this point twice.. but let me try again. : ) If I'm confused, please help me.
aren't the "different resulting dependencies" non-python, non-pypi system dependencies?
the dependencies the sdist would cover are the python/pypi dependencies, and they don't usually vary based on building extensions
as for representing non-python distro dependencies and build variances, I mentioned before that PEP426 extensions might be used to cover that, but that's still an open question I think..
Ah, sorry, there was some discussion of this buried deep inside the giant thread :-) For NumPy/SciPy/scikit-learn/etc. we have the problem that we need to somehow distribute the standard linear algebra libraries like BLAS and LAPACK. For e.g. debian packages, or a self-built version on Linux, these are generally treated as distro dependencies -- the user is responsible for making sure that they're (somehow) installed, and then we link to them directly. For wheels, though, we can't assume this -- not just because there's no spec for distro dependencies, but because even if there would then it'd probably be fragile (no guarantee that different distros have compatible ABIs) and it certainly wouldn't work on Windows or OS X. So instead, the current plan is that we're going to drop the libraries inside a wheel and upload it to PyPI: https://mail.python.org/pipermail/distutils-sig/2015-October/027164.html and our wheels on pypi will have an install_requires on this special library wheel. Compared to PEP426-extensions idea, this has the advantages that (a) it doesn't require a PEP, (b) there's no question about whether it will work :-). Another example of where this might arise is if a package [1] wants to optionally use dynd as its array backend instead of numpy [2], so it would need to optionally link to the dynd C API instead of or in addition to the numpy API. But dynd is a young and rapidly changing project, so you might want to support this as an experimental build configuration while still uploading "official" wheels that depend on numpy only. But the configuration that does link to dynd would still want to fetch dynd using python-level dependencies [3]. Does that help? -n [1] This isn't terribly hypothetical, pandas is one of the most popular data processing libraries in the world and they have an experimental branch doing exactly this: https://github.com/pydata/pandas/issues/8643 [2] https://github.com/libdynd/libdynd [3] https://pypi.python.org/pypi/dynd -- Nathaniel J. Smith -- http://vorpus.org
So instead, the current plan is that we're going to drop the libraries inside a wheel and upload it to PyPI:
aha... ok, now it's clearer where you're coming from. but using what platform in the wheel tag? linux wheels are blocked currently regardless of the tagging though, I'm not so sure about using PyPI like this to hold all these "python" wrapper packages for binary extensions
On Sun, Oct 11, 2015 at 10:55 PM, Marcus Smith <qwcode@gmail.com> wrote:
So instead, the current plan is that we're going to drop the libraries inside a wheel and upload it to PyPI:
aha... ok, now it's clearer where you're coming from. but using what platform in the wheel tag? linux wheels are blocked currently
Windows is the first goal, likely OS X after that (OS X provides a bit more in the base system so distributing these external dependencies isn't as urgent, but it would be good eventually), and yeah, doing this for Linux would obviously require unblocking Linux wheels. (We do have a plan for unblocking them, but the critical person is a bit busy so I'm not sure how long it will be before it can be unveiled :-). It's not really a secret, the plan is just: take advantage of Continuum and Enthought's hard-won knowledge from distributing hundreds of thousands of pre-built python package binaries on Linux to define a named "common Linux platform" abi tag that is known to be abi-compatible with >99.99% of Linux installs out there. This doesn't prevent the use of other abi tags -- e.g. distro-specific ones -- or the development of more detailed metadata about "distro dependencies", and wouldn't require that every single Linux box be willing to install wheels with this abi tag, but it would at least bring Linux support up to parity with the other platforms.)
regardless of the tagging though, I'm not so sure about using PyPI like this to hold all these "python" wrapper packages for binary extensions
If someone else invents a better solution before this is implemented then of course we'd switch to that instead... -n -- Nathaniel J. Smith -- http://vorpus.org
On 12 October 2015 at 07:14, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Oct 11, 2015 at 10:55 PM, Marcus Smith <qwcode@gmail.com> wrote:
So instead, the current plan is that we're going to drop the libraries inside a wheel and upload it to PyPI:
aha... ok, now it's clearer where you're coming from. but using what platform in the wheel tag? linux wheels are blocked currently
Windows is the first goal, likely OS X after that (OS X provides a bit more in the base system so distributing these external dependencies isn't as urgent, but it would be good eventually), and yeah, doing this for Linux would obviously require unblocking Linux wheels.
I'm running out of time to keep up with this mega-thread, so I'm starting to only skim at this point. Apologies if I've missed something that has already been said as a result... My view is that if you distribute a library (e.g., openblas) in a wheel, then it's essentially a "Python dependency" at that point, insofar as it can participate in the standard package dependency process in exactly the same way as (say) numpy itself does. So if you use that approach, I no longer see *any* reason why you can't work within the existing dependency process. Is that right?
We do have a plan for unblocking them, but the critical person is a bit busy so I'm not sure how long it will be before it can be unveiled :-).
Obviously, as Marcus said, you need Linux wheels unblocked, and that means being able to specify compatibility cleanly, but as you say here, you're looking at that. Paul
On October 12, 2015 at 12:54:07 AM, Nathaniel Smith (njs@pobox.com) wrote: I dunno -- I'm sure there must exist some other ways forward that don't require dropping the dream of static dependencies. At one extreme, we had a birds-of-a-feature at SciPy this year on "the future of numpy", and the most vocal audience contingent was in favor of numpy simply dropping upstream support for pip/wheels/pypi entirely and requiring all downstream packages/users to switch to conda or building by hand. It sounds like a terrible idea to me. But I would find it easier to believe in the pypi/pip ecosystem if there were some concrete plan for how this all-static world was actually going to work, and that it wasn't just chasing rainbows. FTR, I plan on making some sort of auto builder for PyPI so it’s possible that we can get pip to a point where almost all things it downloads are binary wheels and we don’t need to worry too much about needing to optimize the sdist case. I also think that part of the problem with egg-info and setup.y’s and pip’s attempted use of them is just down to the setup.py interface being pretty horrible and setup.py’s serving too many masters where it needs to be both VCS entry point, sdist metadata entry point, installation entry point, and wheel building entry point. I also think that it would be a terrible idea to have the science stack leave the “standard” Python packaging ecosystem and go their own way and I think it’d make the science packages essentially useless to the bulk of the non science users. I think a lot of the chasing rainbows like stuff comes mostly from: We have some desires from our experiences but we haven’t yet taken the time to fully flesh out what the impact of those desires are, nor are any of us science stack users (or contributors) to my knowledge, so we don’t deal with the complexities of that much [1]. One possible idea that I’ve thought of here, which may or may not be a good idea: Require packages to declare up front conditional dependencies (I’m assuming the list of dependencies that a project *could* have is both finite and known ahead of time) and let them give groups of these dependencies a name. I’m thinking something similar to setuptools extras where you might be able to put a list of dependencies to a named group. The build interface could include a way for the thing that’s calling the build tool to say “I require the feature represented by this named group of dependencies”[2], and then the build tool can hard fail if it detects it can’t be build in a way that requires those dependencies at runtime. When the final build is done, it could put into the Wheel a list of all of the additional named groups of dependencies it built with. The runtime dependencies of a wheel would then be the combination of all of those named groups of dependencies + the typical install_requires dependencies. This could possibly even be presented nicely on PyPI as a sort of “here are the extra features you can get with this library, and what that does to it’s dependencies”. Would something like that solve Numpy’s dependency needs? Or is it the case that you don’t know ahead of time what the entire list of the dependency specifiers could be (or that it would be unreasonable to have to declare them all up front?). I think I recall someone saying that something might depend on something like “Numpy >= 1.0” in their sdist, but once it’s been built then they’ll need to depend on something like “Numpy >= $VERSION_IT_WAS_BUILT_AGAINST”. If this is something that’s needed, then we might not be able to satisfy this particular thing. I think I should point out too, that I’m not dead set on having static dependency information inside of a sdist/source wheel/whatever. What I am dead set on having is that all of the metadata *inside* of the sdist/source wheel/whatever should be static and it should include as much as possible which isn’t specific to a particular wheel. This means that things like name, version, summary, description, project URLs, etc. these are obviously (to me anyways) not going to be specific to a wheel and should be kept static inside of the sdist (and then copied over to resulting wheel as static as well [3]) and you simply can’t get information out of a sdist that is inherent to wheels. Obvious (again, to me) examples of data like that are things like build number, ABI the wheel was compiled against, etc. Is the list of runtime dependencies one of the things that are specific to a particular wheel? I don’t know, it’s not obvious to me but maybe it is. [4] [1] That being said, I’d love it if someone who does deal with this things would become more involved with the “standard” ecosystem so we can have experts who deal with that side of things as well, because I do think it’s an important use case. [2] And probably: “Even if you can do the feature suggested by this named group of dependencies, I don’t want it” [3] This rule should be setup so that we can have an assertion put into place that this data will remain the exact same when building a wheel from a sdist. [4] I think there’s possibly some confusion in what is causing problems. I think that the entirety of ``setup.py`` is full of problems with executing in different environments and isn’t very well designed to enable robust packages. A better designed interface would likely resolve a good number of the problems that pip currently has either way. It is true however that needing to download the sdist prior to resolving dependencies makes things a lot slower, but I also think that doing it correctly is more important than doing it quickly. We could possibly resolve some of this by pushing more people to publish Wheels, expanding the cases where Wheels are possible to be uploaded, and creating a wheel builder service. These things could make it possible to reduce the situations where you *need* to download + build locally to do a dependency resolution. I think that something like the pip wheel building cache can reduce the need for this as well, since we’ll already have built wheels available locally in the wheel cache that we can query for dependency information. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sun, Oct 11, 2015 at 10:49 PM, Donald Stufft <donald@stufft.io> wrote:
FTR, I plan on making some sort of auto builder for PyPI so it’s possible that we can get pip to a point where almost all things it downloads are binary wheels and we don’t need to worry too much about needing to optimize the sdist case.
That would be 1000% awesomesauce.
I also think that part of the problem with egg-info and setup.y’s and pip’s attempted use of them is just down to the setup.py interface being pretty horrible and setup.py’s serving too many masters where it needs to be both VCS entry point, sdist metadata entry point, installation entry point, and wheel building entry point.
Yeah, I'm glad that Robert linked https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface because I hadn't seen it, and it's great that it's documented, but what reading that documentation mostly convinced me is that I definitely could not write a new implementation of that interface and get it right without significant pain and trial-and-error :-). I mean... you have to support both --single-version-externally-managed enabled and disabled? develop has to know the right arcane magic to get the directory added to sys.path? (how do you know where in sys.path to add the magic?) what is egg_info even supposed to do -- the linked to docs don't actually define what egg metadata looks like, and IIRC it's not standardized, is it? (as opposed to the dist-info used in wheels) Like, obviously all that should be needed to implement a new build system is the ability to generate a wheel, right? Wheels are well-documented and one honking great idea. My current best guess at where we want to end up with sdists is: - there's a static file like the _pypackage/_pypackage.cfg in my original proposal (ignore the name, we can change it, I don't care) - it's written in some simple well-specified language like json or toml (if you look at the spec, toml is basically json encoded in an INI-style syntax -- it's definitely nowhere near as complex as yaml) - it includes static metadata for name, version (required for pypi uploads, optional otherwise to accomodate VCS checkouts), description, long_description, authors, URLs, trove, etc. etc., but not install_depends - it also includes the setup-dependencies / setup-dependencies-dynamic / build-wheel / build-in-place entries/hooks from my original proposal (or the moral equivalent) - it lives inside the single top-level directory next to the source code (instead of having a meta/ + source/ split, because that way it works for VCS checkouts too and AFAICT there aren't any other advantages to this split. We can afford to steal a single name -- everyone use used to having a setup.py or Makefile or Gemfile or whatever cluttering up the top-level of their source, and this is just a replacement for that.) That gives you static metadata for PyPI and solves the setup_depends problem while simultaneously cleaning up the "fugly" setup.py interface so that it becomes easy to write your own build system. (I think instead of just arguing about tests, we should have big debates here about which of the 3-4 highly-regarded competing build systems we should be recommending to newbies. But this will only happen if we make designing a new build system something that dozens of people do and a few take seriously, which requires making it accessible.) I *think* the above interface also makes it possible to write a shim in either direction (a setup.py that speaks the current "fugly" interface to old versions of pip while using the new files to actually do its work, or a new-style sdist that delegates all the actual work to setup.py), and even for setuptools to start generating sdists that are simultaneously both old-style and new-style for compatibility. So I think we can at least solve the metadata-for-PyPI problem, the setup_requires problem, and the let-a-thousand-build-systems-bloom problems now, without solving the static-install_requires-in-sdists problem (but also without creating any particular obstacles to solving it in the future, if it's even possible). And in the mean time we can keep working on autobuilders and linux wheels that will remove much of the demand for automagic sdist building -- it's difficult to make predictions, especially about the future, so who knows, maybe in the end we'll decide we just don't care that much about static-install_requires-in-sdists.
I also think that it would be a terrible idea to have the science stack leave the “standard” Python packaging ecosystem and go their own way and I think it’d make the science packages essentially useless to the bulk of the non science users. I think a lot of the chasing rainbows like stuff comes mostly from: We have some desires from our experiences but we haven’t yet taken the time to fully flesh out what the impact of those desires are, nor are any of us science stack users (or contributors) to my knowledge, so we don’t deal with the complexities of that much [1].
One possible idea that I’ve thought of here, which may or may not be a good idea:
Require packages to declare up front conditional dependencies (I’m assuming the list of dependencies that a project *could* have is both finite and known ahead of time) and let them give groups of these dependencies a name. I’m thinking something similar to setuptools extras where you might be able to put a list of dependencies to a named group. The build interface could include a way for the thing that’s calling the build tool to say “I require the feature represented by this named group of dependencies”[2], and then the build tool can hard fail if it detects it can’t be build in a way that requires those dependencies at runtime. When the final build is done, it could put into the Wheel a list of all of the additional named groups of dependencies it built with. The runtime dependencies of a wheel would then be the combination of all of those named groups of dependencies + the typical install_requires dependencies. This could possibly even be presented nicely on PyPI as a sort of “here are the extra features you can get with this library, and what that does to it’s dependencies”.
Would something like that solve Numpy’s dependency needs?
The answer is, I don't know! I am not at all confident in my ability to anticipate every possible difficult case, and NumPy is really not the only complicated library out there, it just gets most of the press. :-) Your idea definitely sounds like something worth exploring as part of the general debate around things like "Provides:" and extras, but I'd be hesitant to commit to locking something in right now. Also:
Or is it the case that you don’t know ahead of time what the entire list of the dependency specifiers could be (or that it would be unreasonable to have to declare them all up front?). I think I recall someone saying that something might depend on something like “Numpy >= 1.0” in their sdist, but once it’s been built then they’ll need to depend on something like “Numpy >= $VERSION_IT_WAS_BUILT_AGAINST”. If this is something that’s needed, then we might not be able to satisfy this particular thing.
Yes, it is true right now that packages that use the NumPy C API should have a wheel dependency on Numpy >= $VERSION_IT_WAS_BUILT_AGAINST. We'd like it if wheels/pip had even more flexible metadata in this regard that could express things like "package A needs numpy ABI version 3, and numpy 1.8, 1.9, and 1.10 say that they can provide ABI version 3, but 1.11 and 1.7 don't", but we shouldn't worry about the exact details of that kind of system right now -- I'm thinking any more flexible solution to the NumPy ABI problem would be at least as problematic for pip's sdist resolution rules as the current situation, and it sounds like the current situation is already bad enough to rule out a lot of cases. -n -- Nathaniel J. Smith -- http://vorpus.org
2) after unpacking this sdist it then calls 'setup.py egg_info' to get the full metadata for the wheel
I wouldn't say "get the full metadata for the wheel". it's not a wheel yet. `egg_info` run so we can use the pkg_resources api to find the dependencies.
Specifically what it does with this is extract the setup_requires and install_requires fields
specifically, we call `requires` on pkg_resources distribution objects https://github.com/pypa/pip/blob/develop/pip/req/req_set.py#L583
3) eventually it actually builds the package, and this produces a wheel (or wheel equivalent) that has its own metadata (which often matches the metadata from egg_info in step (2), but not always)
"not always"? not following your point they're 2 different formats, but should contain the same essential information. here's the wheel code that does the conversion https://bitbucket.org/pypa/wheel/src/1cb7374c9ea4d5c82992f1d9b24cf7168ca87707/wheel/metadata.py?at=default&fileviewer=file-view-default#metadata.py-90
name and version before it runs egg_info, I assume that what this means is that it's crucial for pip to have static access to dependency information?
yes
It would be somewhat convenient if sdists did list their binary dependencies:
not sure about your insertion of "binary" here. pip is concerned with finding python project dependencies (i.e. name and version constraints) in the sdist and then based on the current install environment, it will further constrain the wheels chosen based on architecture and python implementation. and to be perfectly clear, none of this deals with non-python OS/Distro requirements.
3) if any of the packages-to-be-installed are sdists, then fetch them, run egg_info or build them or whatever to get their real dependencies, add these to the graph, and go to step 1
this is the pain we don't want in the future. since we're simply talking about name/version constraints (not platform/python), It's hard to conceive that we'd agree on an sdist spec that didn't include that.
Do the relevant pip maintainers even read this mailing list? :-)
I try : )
since we're simply talking about name/version constraints (not platform/python)
let me clarify that... since I imagine someone might bring up markers in response... markers allow you to vary your name/version dependencies by environment (for example, by platform and python version), but to be clear, it's still not a "binary" dependency declaration... i.e. the dependency is still declared by name and a version specifier alone. On Sun, Oct 11, 2015 at 10:48 AM, Marcus Smith <qwcode@gmail.com> wrote:
2) after unpacking this sdist it then calls 'setup.py egg_info' to get the full metadata for the wheel
I wouldn't say "get the full metadata for the wheel". it's not a wheel yet. `egg_info` run so we can use the pkg_resources api to find the dependencies.
Specifically what it does with this is extract the setup_requires and install_requires fields
specifically, we call `requires` on pkg_resources distribution objects https://github.com/pypa/pip/blob/develop/pip/req/req_set.py#L583
3) eventually it actually builds the package, and this produces a wheel (or wheel equivalent) that has its own metadata (which often matches the metadata from egg_info in step (2), but not always)
"not always"? not following your point they're 2 different formats, but should contain the same essential information. here's the wheel code that does the conversion
name and version before it runs egg_info, I assume that what this means is that it's crucial for pip to have static access to dependency information?
yes
It would be somewhat convenient if sdists did list their binary dependencies:
not sure about your insertion of "binary" here. pip is concerned with finding python project dependencies (i.e. name and version constraints) in the sdist and then based on the current install environment, it will further constrain the wheels chosen based on architecture and python implementation. and to be perfectly clear, none of this deals with non-python OS/Distro requirements.
3) if any of the packages-to-be-installed are sdists, then fetch them, run egg_info or build them or whatever to get their real dependencies, add these to the graph, and go to step 1
this is the pain we don't want in the future.
since we're simply talking about name/version constraints (not platform/python), It's hard to conceive that we'd agree on an sdist spec that didn't include that.
Do the relevant pip maintainers even read this mailing list? :-)
I try : )
On Sun, Oct 11, 2015 at 10:48 AM, Marcus Smith <qwcode@gmail.com> wrote:
2) after unpacking this sdist it then calls 'setup.py egg_info' to get the full metadata for the wheel
I wouldn't say "get the full metadata for the wheel". it's not a wheel yet. `egg_info` run so we can use the pkg_resources api to find the dependencies.
Specifically what it does with this is extract the setup_requires and install_requires fields
specifically, we call `requires` on pkg_resources distribution objects https://github.com/pypa/pip/blob/develop/pip/req/req_set.py#L583
3) eventually it actually builds the package, and this produces a wheel (or wheel equivalent) that has its own metadata (which often matches the metadata from egg_info in step (2), but not always)
"not always"? not following your point they're 2 different formats, but should contain the same essential information. here's the wheel code that does the conversion https://bitbucket.org/pypa/wheel/src/1cb7374c9ea4d5c82992f1d9b24cf7168ca87707/wheel/metadata.py?at=default&fileviewer=file-view-default#metadata.py-90
I just meant that due to the fact that you're running two chunks of arbitrary code, there's no way to statically guarantee that the metadata produced in the two calls to 'setup.py' is going to match. In the message here: https://mail.python.org/pipermail/distutils-sig/2015-October/027161.html I linked to some packages whose setup.py's actually report different dependency information in the egg_info and build phases. I'm not saying this is a good or bad or even particularly relevant thing, just observing that it's a fact about how the current system works :-). (And this is part of why my draft PEP that set all this off just has a single atomic "build a wheel" operation instead of splitting this into two phases.)
name and version before it runs egg_info, I assume that what this means is that it's crucial for pip to have static access to dependency information?
yes
It would be somewhat convenient if sdists did list their binary dependencies:
not sure about your insertion of "binary" here. pip is concerned with finding python project dependencies (i.e. name and version constraints) in the sdist and then based on the current install environment, it will further constrain the wheels chosen based on architecture and python implementation. and to be perfectly clear, none of this deals with non-python OS/Distro requirements.
I didn't mean anything in particular by "binary", sorry if that threw you off. It is probably because I am still finding it useful to think of source packages and binary packages as logically distinct, i.e., a source package is a black box that can produce a binary package and when I have a dependency it names a binary package that I want (which can be obtained either by downloading it or by finding a source package that will produce that binary package and building it).
3) if any of the packages-to-be-installed are sdists, then fetch them, run egg_info or build them or whatever to get their real dependencies, add these to the graph, and go to step 1
this is the pain we don't want in the future.
since we're simply talking about name/version constraints (not platform/python), It's hard to conceive that we'd agree on an sdist spec that didn't include that.
Do the relevant pip maintainers even read this mailing list? :-)
I try : )
Sorry, I realized that may have come across as much more negative than I intended :-). I just realized at the end of typing this whole thing that I wasn't even sure if I was sending it to the right place :-). -- Nathaniel J. Smith -- http://vorpus.org
participants (5)
-
Donald Stufft
-
Marcus Smith
-
Nathaniel Smith
-
Paul Moore
-
Robert Collins