Re: [Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

2 Oct 2015

      On 2 October 2015 at 23:15, Nathaniel Smith <njs@pobox.com> wrote:
...
"Project" is a pretty messy concept. Obviously in simple cases there's
a one-to-one mapping between project <-> wheel <-> importable package,
but this breaks down quickly in edge cases.
I mistakenly used "project" in an attempt to avoid confusion resulting
from me using the word "distribution" as a more general term than the
way you were using "source distribution" or "binary distribution".
Clearly I failed and made things more confusing.

I use the term "distribution" in the sense used here
https://packaging.python.org/en/latest/glossary/#term-distribution-package.
Note that this is in contrast to the terms "source distribution" and
"binary distribution" or "built distribution" in the same page.

Sorry for confusing things. I'll stick to the terminology as in the
PUG glossary from now on.
...
Consider a project that provides builds multiple wheels out of the
same source tree. You obviously can't expect that all of these
packages will have the same dependencies.
Correct. But a distribution can and should (I believe) have the same
dependencies for all of the source and built distributions derived
from it.
...
This situation is not common today for Python packages, but the only
reason for that is that distutils makes it really hard to do -- it's
extremely common in other package ecosystems, and the advantages are
obvious. E.g., maybe numpy.distutils should be split into a separately
installable package from numpy -- there's no technical reason that
this should mean we are now forced to move the code for it into its
own VCS repository.
I'm lost here, I'm afraid. Could you rephrase this in terms of the
definitions from the PUG glossary? It sounds to me like the VCS
repository is the project, which contains multiple distributions. I
don't see how that's particularly hard. Each distribution just has its
own subdirectory (and setup.py) in the VCS repository...
...
(I assume that by "platform tags" you mean what PEP 426 calls
"environment markers".)
Nope, I mean as defined in PEP 425. The platform tag is part of the
compatibility tag. Maybe I meant the ABI tag, I don't really follow
the distinctions.
...
Environment markers are really useful for extending the set of cases
that can be handled by a single architecture-dependent wheel. And
they're a good fit for that environment, given that wheels can't
contain arbitrary code.
But they're certainly never going to be adequate to provide a single
static description of every possible build configuration of every
possible project. And installing an sdist already requires arbitrary
code execution, so it doesn't make sense to try to build some
elaborate system to avoid arbitrary code execution just for the
dependency specification.
You're right that in a perfect future world numpy C API related
dependencies would be handling by some separate ABI-tracking mechanism
similar to how the CPython ABI is tracked, so here are some other
examples of why environment markers are inadequate:
In the future it will almost certainly be possible to build numpy in
two different configurations: one where it expects to find BLAS inside
a wheel distributed for this purpose (e.g. this is necessary to
provide high-quality windows wheels), and one where it expects to find
BLAS installed on the system. This decision will *not* be tied to the
platform, but be selectable at build time. E.g., on OS X there is a
system-provided BLAS library, but it has some issues. So the default
wheels on PyPI will probably act like windows and depend on a
BLAS-package that we control, but there will also be individual users
who prefer to build numpy in the configuration where it uses the
system BLAS, so we definitely need to support both options on OS X.
Now the problem: There will never be a single environment marker that
you can stick into a wheel or sdist that says "we depend on the
'pyblas' package if the system is OS X (ok) and the user set this flag
in this configuration file during the build process (wait wut)".
Similarly, I think someone was saying in a discussion recently that
lxml supports being built either in a mode where it requires libxml be
available on the system, or else it can be statically linked. Even if
in the future we start having metadata that lets us describe
dependencies on external system libraries, it's never going to be the
case that we can put the *same* dependency metadata into wheels that
are built using these two configurations.
This is precisely the very complex issue that's being discussed under
the banner of extending compatibility tags in a way that gives a
viable but practical way of distinguishing binary wheels. You can
either see that as a discussion about "expanding compatibility tags"
or "finding something better than compatibility tags". I don't have
much of a stake in that discussion, as the current compatibility tags
suit my needs fine, as a Windows user. The issues seem to be around
Linux and possibly some of the complexities around binary dependencies
for numerical libraries.

But the key point here is that I see the solution for this as being
about distinguishing the "right" wheel for the target environment.
It's not about anything that should reach back to sdists. Maybe a
solution will involve a PEP 426 metadata enhancement that adds
metadata that's only valid in binary distributions and not in source
distributions, but that's fine by me. But it won't replace the
existing dependency data, which *is* valid at the sdist level.

At least as far as I can see - I'm willing to be enlightened. But your
argument seems to be that sdist-level dependency information should be
omitted because more detailed ABI compatibility data *might* be needed
at the wheel level for some packages. I don't agree with that - we
still need the existing metadata, even if more might be required in
specialist cases.
...
...
[1] If extras and environment markers don't cover the needs of
scientific modules, we need some input into their design from the
scientific community. But again, let's not throw away the work that's
already done.
As far as sdists go, you can either cover 90% of the cases by building
increasingly elaborate metadata formats, or you can cover 100% of the
cases by keeping things simple...
But your argument seems to be that having metadata generated from
package build code is "simpler". My strong opinion, based on what I've
seen of the problems caused by having metadata in an "exectable
setup.py", is that static metadata is far simpler.

I don't believe that the cost of changing to a new system can be
justified *without* getting the benefits of static metadata.

Paul

Re: [Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Paul Moore