[Distutils] What metadata does pip actually need about sdists?

Nathaniel Smith njs at pobox.com
Mon Oct 12 10:24:35 CEST 2015

On Sun, Oct 11, 2015 at 10:49 PM, Donald Stufft <donald at stufft.io> wrote:
> FTR, I plan on making some sort of auto builder for PyPI so it’s possible
> that we can get pip to a point where almost all things it downloads are
> binary wheels and we don’t need to worry too much about needing to optimize
> the sdist case.

That would be 1000% awesomesauce.

> I also think that part of the problem with egg-info and
> setup.y’s and pip’s attempted use of them is just down to the setup.py
> interface being pretty horrible and setup.py’s serving too many masters
> where it needs to be both VCS entry point, sdist metadata entry point,
> installation entry point, and wheel building entry point.

Yeah, I'm glad that Robert linked
because I hadn't seen it, and it's great that it's documented, but
what reading that documentation mostly convinced me is that I
definitely could not write a new implementation of that interface and
get it right without significant pain and trial-and-error :-). I
mean... you have to support both --single-version-externally-managed
enabled and disabled? develop has to know the right arcane magic to
get the directory added to sys.path? (how do you know where in
sys.path to add the magic?) what is egg_info even supposed to do --
the linked to docs don't actually define what egg metadata looks like,
and IIRC it's not standardized, is it? (as opposed to the dist-info
used in wheels)

Like, obviously all that should be needed to implement a new build
system is the ability to generate a wheel, right? Wheels are
well-documented and one honking great idea.

My current best guess at where we want to end up with sdists is:

- there's a static file like the _pypackage/_pypackage.cfg in my
original proposal (ignore the name, we can change it, I don't care)
- it's written in some simple well-specified language like json or
toml (if you look at the spec, toml is basically json encoded in an
INI-style syntax -- it's definitely nowhere near as complex as yaml)
- it includes static metadata for name, version (required for pypi
uploads, optional otherwise to accomodate VCS checkouts), description,
long_description, authors, URLs, trove, etc. etc., but not
- it also includes the setup-dependencies / setup-dependencies-dynamic
/ build-wheel / build-in-place entries/hooks from my original proposal
(or the moral equivalent)
- it lives inside the single top-level directory next to the source
code (instead of having a meta/ + source/ split, because that way it
works for VCS checkouts too and AFAICT there aren't any other
advantages to this split. We can afford to steal a single name --
everyone use used to having a setup.py or Makefile or Gemfile or
whatever cluttering up the top-level of their source, and this is just
a replacement for that.)

That gives you static metadata for PyPI and solves the setup_depends
problem while simultaneously cleaning up the "fugly" setup.py
interface so that it becomes easy to write your own build system. (I
think instead of just arguing about tests, we should have big debates
here about which of the 3-4 highly-regarded competing build systems we
should be recommending to newbies. But this will only happen if we
make designing a new build system something that dozens of people do
and a few take seriously, which requires making it accessible.)

I *think* the above interface also makes it possible to write a shim
in either direction (a setup.py that speaks the current "fugly"
interface to old versions of pip while using the new files to actually
do its work, or a new-style sdist that delegates all the actual work
to setup.py), and even for setuptools to start generating sdists that
are simultaneously both old-style and new-style for compatibility.

So I think we can at least solve the metadata-for-PyPI problem, the
setup_requires problem, and the let-a-thousand-build-systems-bloom
problems now, without solving the static-install_requires-in-sdists
problem (but also without creating any particular obstacles to solving
it in the future, if it's even possible). And in the mean time we can
keep working on autobuilders and linux wheels that will remove much of
the demand for automagic sdist building -- it's difficult to make
predictions, especially about the future, so who knows, maybe in the
end we'll decide we just don't care that much about

> I also think that it would be a terrible idea to have the science stack
> leave the “standard” Python packaging ecosystem and go their own way and I
> think it’d make the science packages essentially useless to the bulk of the
> non science users. I think a lot of the chasing rainbows like stuff comes
> mostly from: We have some desires from our experiences but we haven’t yet
> taken the time to fully flesh out what the impact of those desires are, nor
> are any of us science stack users (or contributors) to my knowledge, so we
> don’t deal with the complexities of that much [1].
> One possible idea that I’ve thought of here, which may or may not be a good
> idea:
> Require packages to declare up front conditional dependencies (I’m assuming
> the list of dependencies that a project *could* have is both finite and
> known ahead of time) and let them give groups of these dependencies a name.
> I’m thinking something similar to setuptools extras where you might be able
> to put a list of dependencies to a named group. The build interface could
> include a way for the thing that’s calling the build tool to say “I require
> the feature represented by this named group of dependencies”[2], and then
> the build tool can hard fail if it detects it can’t be build in a way that
> requires those dependencies at runtime. When the final build is done, it
> could put into the Wheel a list of all of the additional named groups of
> dependencies it built with. The runtime dependencies of a wheel would then
> be the combination of all of those named groups of dependencies + the
> typical install_requires dependencies. This could possibly even be presented
> nicely on PyPI as a sort of “here are the extra features you can get with
> this library, and what that does to it’s dependencies”.
> Would something like that solve Numpy’s dependency needs?

The answer is, I don't know! I am not at all confident in my ability
to anticipate every possible difficult case, and NumPy is really not
the only complicated library out there, it just gets most of the
press. :-) Your idea definitely sounds like something worth exploring
as part of the general debate around things like "Provides:" and
extras, but I'd be hesitant to commit to locking something in right
now. Also:

> Or is it the case
> that you don’t know ahead of time what the entire list of the dependency
> specifiers could be (or that it would be unreasonable to have to declare
> them all up front?). I think I recall someone saying that something might
> depend on something like “Numpy >= 1.0” in their sdist, but once it’s been
> built then they’ll need to depend on something like “Numpy >=
> $VERSION_IT_WAS_BUILT_AGAINST”. If this is something that’s needed, then we
> might not be able to satisfy this particular thing.

Yes, it is true right now that packages that use the NumPy C API
should have a wheel dependency on Numpy >=
$VERSION_IT_WAS_BUILT_AGAINST. We'd like it if wheels/pip had even
more flexible metadata in this regard that could express things like
"package A needs numpy ABI version 3, and numpy 1.8, 1.9, and 1.10 say
that they can provide ABI version 3, but 1.11 and 1.7 don't", but we
shouldn't worry about the exact details of that kind of system right
now -- I'm thinking any more flexible solution to the NumPy ABI
problem would be at least as problematic for pip's sdist resolution
rules as the current situation, and it sounds like the current
situation is already bad enough to rule out a lot of cases.


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list