[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Thu Oct 8 13:46:30 CEST 2015

On 8 October 2015 at 11:18, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> On 7 October 2015 at 22:41, Paul Moore <p.f.moore at gmail.com> wrote:
>> On 7 October 2015 at 22:28, Nathaniel Smith <njs at pobox.com> wrote:
>>> Maybe I have misunderstood: does it actually help pip at all to have
>>> static access to name and version, but not to anything else? I've been
>>> assuming not, but I don't think anyone's pointed to any examples yet
>>> of the problems that pip is encountering due to the lack of static
>>> metadata -- would this actually be enough to solve them?
>>
>> The principle I am working on is that *all* metadata in a source wheel
>> should be statically available - that's not just for pip, but for all
>> other consumers, including distro packagers. What's not set in stone
>> is precisely what (subsets of) metadata are appropriate for source
>> wheels as opposed to (binary) wheels.
>
> A concrete example would be whether or not the numpy source wheel
> depends on pyopenblas. Depending on how numpy is built the binary
> wheel may or may not depend on pyopenblas. It doesn't make any sense
> to say that the numpy source release depends on pyopenblas so what
> should be the dependencies of the source wheel?

Well, I said this previously but I don't have any objections to the
idea that binary wheels have additional dependencies - so the source
wheel doesn't depend on pyopenblas but the binary does.

But as I understand it, this is currently theoretical - there isn't
yet any pyopenblas validate these speculations against? I say this not
because I think the approach is invalid, but because I think there are
probably a lot of untested questions that need answering.

Let's expand the scenario a bit.

The user (presumably) still just says "python -m pip install numpy".
What happens then?

1. Assume there's a binary wheel that's compatible with the user's platform.
1a. If there are multiple compatible binary wheels, pip chooses the
"most compatible" so we're safe to assume there's only one. [1]
2. Looking at the dependencies, say it depends on pyopenblas. So pip
needs to install pyopenblas.
2a. If there's a compatible wheel for pyopenblas, pip installs that too.
2b. If there's no compatible pyopenblas wheel, pip falls back to a
source wheel, builds it, and uses that. If the build fails, the whole
numpy install fails.
3. If there's no compatible numpy binary wheel, pip gets the source
wheel and builds it. There's no user interaction possible here [2], so
the build uses whatever defaults the numpy build process identifies as
"most appropriate" for the user's platform. This may be simply a
lowest common denominator, or it may do some form of introspection of
the user's system to get the best possible build. Either way, a wheel
is generated that's known to work on the user's system, so there
should be no additional dependencies injected at this point, and pip
will use that wheel directly.

The only constraint here is that a binary numpy wheel built with the
default options on a given machine from a numpy source wheel cannot
have extra dependencies that aren't known to be already satisfied by
the user's system, because by the time pip generates a wheel from the
source wheel, it's finished doing dependency resolution so any new
dependencies won't get checked.

I don't see it as a problem for any hypothetical new build system to
conform to this constraint - by default a built wheel must work on the
system it's built on. All it means is that to build binaries with
additional dependencies must be done manually, supplying options
describing your intent.

[1] Dependencies are *not* considered as part of the compatibility
matching, so it's correct that this step happens before the dependency
checks. Maybe you're assuming that if there are two wheels, one
depending on pyopenblas and one not, then if the user doesn't have
pyopenblas installed the wheel that doesn't depend on it will be used?
But that's not how pip works.

[2] When pip runs installs, it does so non-interactively. Whatever
command pip uses to build a wheel ("python setup.py bdist_wheel" at
the moment) must run without user interaction and produce a wheel that
is compatible with the user's environment.

So unless I'm mistaken about what you're saying, I don't see any issue
here. Unless you're saying that you're not willing to work under some
of the constraints I describe above - but in that case, you need pip's
compatibility matching, dependency resolution, or automated wheel
build processes to change. That's fine but to move the discussion
forwards, we'd then need to understand (and agree with) whatever
changes you need in pip. At the moment, I'm not aware that anyone has
asked for substantive changes to pip's behaviour in these areas as
part of this proposal.

> One possibility which I think is what Nathaniel is getting at is that
> there is a source release and then that could be used to generate
> different possible source wheels each of which would correspond to a
> particular configuration of numpy. Each source wheel would correspond
> to one binary wheel and have all static metadata but there still needs
> to be a separate source release that is used to generate the different
> source wheels.

That's possible, but what would these multiple source wheels be
called? They couldn't all be called "numpy" as how would the user say
which one they wanted? Pip can't decide. They can't be called numpy
and distinguished by versions, as then how would you decide whether
"numpy with openblas" is "newer" or "older" than "numpy with MKL"?
That's the issue with Christoph Gohlke's current means of versioning
his MKL builds.

So you're looking at multiple PyPI projects, one for each "flavour" of
numpy. Or you're looking at changes to how PyPI and pip define a
"project". Neither of those options sound particularly straightforward
to me.

> The step that turns a source wheel into a binary wheel would be
> analogous to the ./configure step in a typical makefile project.
> ./configure is used to specify the options corresponding to all the
> different ways of compiling and installing the project. After running
> ./configure the command "make" is unparametrised and performs the
> actual compilation: this step is analogous to converting a source
> wheel to a binary wheel.

But the Python (PyPI/pip) model is different from the autoconf
"typical makefile project" model. There's no configure step. If you're
proposing that we add one, then that's a pretty major change in
structure and would have some fairly wide-ranging impacts (on PyPI and
pip, and also on 3rd party projects like bandersnatch and devpi). I
don't think we're even close to understanding how we'd manage such a
change.

> I think this satisfies all of the requirements for static metadata and
> one-to-one correspondence of source wheels and binary wheels. If numpy
> followed this then I imagine that there would be a single source wheel
> on PyPI corresponding to the one configuration that would be used
> consistently there. However numpy still needs to separately release
> the code in a form that is also usable in all of the many other
> contexts that it is already used. IOW they will need to continue to
> issue source releases in more or less the same form as today. It makes
> sense for PyPI to host the source release archives on the project page
> even if pip will simply ignore them.

So you're talking about numpy only supporting one configuration via
PyPI, and expecting any other configurations to be made available only
via other channels? I guess you could do that, but I hope you won't.
It feels to me like giving up before we've properly tried to
understand the issues.

Paul