
On Sun, Oct 11, 2015 at 10:49 PM, Donald Stufft <donald@stufft.io> wrote:
FTR, I plan on making some sort of auto builder for PyPI so it’s possible that we can get pip to a point where almost all things it downloads are binary wheels and we don’t need to worry too much about needing to optimize the sdist case.
That would be 1000% awesomesauce.
I also think that part of the problem with egg-info and setup.y’s and pip’s attempted use of them is just down to the setup.py interface being pretty horrible and setup.py’s serving too many masters where it needs to be both VCS entry point, sdist metadata entry point, installation entry point, and wheel building entry point.
Yeah, I'm glad that Robert linked https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface because I hadn't seen it, and it's great that it's documented, but what reading that documentation mostly convinced me is that I definitely could not write a new implementation of that interface and get it right without significant pain and trial-and-error :-). I mean... you have to support both --single-version-externally-managed enabled and disabled? develop has to know the right arcane magic to get the directory added to sys.path? (how do you know where in sys.path to add the magic?) what is egg_info even supposed to do -- the linked to docs don't actually define what egg metadata looks like, and IIRC it's not standardized, is it? (as opposed to the dist-info used in wheels) Like, obviously all that should be needed to implement a new build system is the ability to generate a wheel, right? Wheels are well-documented and one honking great idea. My current best guess at where we want to end up with sdists is: - there's a static file like the _pypackage/_pypackage.cfg in my original proposal (ignore the name, we can change it, I don't care) - it's written in some simple well-specified language like json or toml (if you look at the spec, toml is basically json encoded in an INI-style syntax -- it's definitely nowhere near as complex as yaml) - it includes static metadata for name, version (required for pypi uploads, optional otherwise to accomodate VCS checkouts), description, long_description, authors, URLs, trove, etc. etc., but not install_depends - it also includes the setup-dependencies / setup-dependencies-dynamic / build-wheel / build-in-place entries/hooks from my original proposal (or the moral equivalent) - it lives inside the single top-level directory next to the source code (instead of having a meta/ + source/ split, because that way it works for VCS checkouts too and AFAICT there aren't any other advantages to this split. We can afford to steal a single name -- everyone use used to having a setup.py or Makefile or Gemfile or whatever cluttering up the top-level of their source, and this is just a replacement for that.) That gives you static metadata for PyPI and solves the setup_depends problem while simultaneously cleaning up the "fugly" setup.py interface so that it becomes easy to write your own build system. (I think instead of just arguing about tests, we should have big debates here about which of the 3-4 highly-regarded competing build systems we should be recommending to newbies. But this will only happen if we make designing a new build system something that dozens of people do and a few take seriously, which requires making it accessible.) I *think* the above interface also makes it possible to write a shim in either direction (a setup.py that speaks the current "fugly" interface to old versions of pip while using the new files to actually do its work, or a new-style sdist that delegates all the actual work to setup.py), and even for setuptools to start generating sdists that are simultaneously both old-style and new-style for compatibility. So I think we can at least solve the metadata-for-PyPI problem, the setup_requires problem, and the let-a-thousand-build-systems-bloom problems now, without solving the static-install_requires-in-sdists problem (but also without creating any particular obstacles to solving it in the future, if it's even possible). And in the mean time we can keep working on autobuilders and linux wheels that will remove much of the demand for automagic sdist building -- it's difficult to make predictions, especially about the future, so who knows, maybe in the end we'll decide we just don't care that much about static-install_requires-in-sdists.
I also think that it would be a terrible idea to have the science stack leave the “standard” Python packaging ecosystem and go their own way and I think it’d make the science packages essentially useless to the bulk of the non science users. I think a lot of the chasing rainbows like stuff comes mostly from: We have some desires from our experiences but we haven’t yet taken the time to fully flesh out what the impact of those desires are, nor are any of us science stack users (or contributors) to my knowledge, so we don’t deal with the complexities of that much [1].
One possible idea that I’ve thought of here, which may or may not be a good idea:
Require packages to declare up front conditional dependencies (I’m assuming the list of dependencies that a project *could* have is both finite and known ahead of time) and let them give groups of these dependencies a name. I’m thinking something similar to setuptools extras where you might be able to put a list of dependencies to a named group. The build interface could include a way for the thing that’s calling the build tool to say “I require the feature represented by this named group of dependencies”[2], and then the build tool can hard fail if it detects it can’t be build in a way that requires those dependencies at runtime. When the final build is done, it could put into the Wheel a list of all of the additional named groups of dependencies it built with. The runtime dependencies of a wheel would then be the combination of all of those named groups of dependencies + the typical install_requires dependencies. This could possibly even be presented nicely on PyPI as a sort of “here are the extra features you can get with this library, and what that does to it’s dependencies”.
Would something like that solve Numpy’s dependency needs?
The answer is, I don't know! I am not at all confident in my ability to anticipate every possible difficult case, and NumPy is really not the only complicated library out there, it just gets most of the press. :-) Your idea definitely sounds like something worth exploring as part of the general debate around things like "Provides:" and extras, but I'd be hesitant to commit to locking something in right now. Also:
Or is it the case that you don’t know ahead of time what the entire list of the dependency specifiers could be (or that it would be unreasonable to have to declare them all up front?). I think I recall someone saying that something might depend on something like “Numpy >= 1.0” in their sdist, but once it’s been built then they’ll need to depend on something like “Numpy >= $VERSION_IT_WAS_BUILT_AGAINST”. If this is something that’s needed, then we might not be able to satisfy this particular thing.
Yes, it is true right now that packages that use the NumPy C API should have a wheel dependency on Numpy >= $VERSION_IT_WAS_BUILT_AGAINST. We'd like it if wheels/pip had even more flexible metadata in this regard that could express things like "package A needs numpy ABI version 3, and numpy 1.8, 1.9, and 1.10 say that they can provide ABI version 3, but 1.11 and 1.7 don't", but we shouldn't worry about the exact details of that kind of system right now -- I'm thinking any more flexible solution to the NumPy ABI problem would be at least as problematic for pip's sdist resolution rules as the current situation, and it sounds like the current situation is already bad enough to rule out a lot of cases. -n -- Nathaniel J. Smith -- http://vorpus.org