On 2 October 2015 at 23:15, Nathaniel Smith <njs@pobox.com> wrote:
"Project" is a pretty messy concept. Obviously in simple cases there's a one-to-one mapping between project <-> wheel <-> importable package, but this breaks down quickly in edge cases.
I mistakenly used "project" in an attempt to avoid confusion resulting from me using the word "distribution" as a more general term than the way you were using "source distribution" or "binary distribution". Clearly I failed and made things more confusing. I use the term "distribution" in the sense used here https://packaging.python.org/en/latest/glossary/#term-distribution-package. Note that this is in contrast to the terms "source distribution" and "binary distribution" or "built distribution" in the same page. Sorry for confusing things. I'll stick to the terminology as in the PUG glossary from now on.
Consider a project that provides builds multiple wheels out of the same source tree. You obviously can't expect that all of these packages will have the same dependencies.
Correct. But a distribution can and should (I believe) have the same dependencies for all of the source and built distributions derived from it.
This situation is not common today for Python packages, but the only reason for that is that distutils makes it really hard to do -- it's extremely common in other package ecosystems, and the advantages are obvious. E.g., maybe numpy.distutils should be split into a separately installable package from numpy -- there's no technical reason that this should mean we are now forced to move the code for it into its own VCS repository.
I'm lost here, I'm afraid. Could you rephrase this in terms of the definitions from the PUG glossary? It sounds to me like the VCS repository is the project, which contains multiple distributions. I don't see how that's particularly hard. Each distribution just has its own subdirectory (and setup.py) in the VCS repository...
(I assume that by "platform tags" you mean what PEP 426 calls "environment markers".)
Nope, I mean as defined in PEP 425. The platform tag is part of the compatibility tag. Maybe I meant the ABI tag, I don't really follow the distinctions.
Environment markers are really useful for extending the set of cases that can be handled by a single architecture-dependent wheel. And they're a good fit for that environment, given that wheels can't contain arbitrary code.
But they're certainly never going to be adequate to provide a single static description of every possible build configuration of every possible project. And installing an sdist already requires arbitrary code execution, so it doesn't make sense to try to build some elaborate system to avoid arbitrary code execution just for the dependency specification.
You're right that in a perfect future world numpy C API related dependencies would be handling by some separate ABI-tracking mechanism similar to how the CPython ABI is tracked, so here are some other examples of why environment markers are inadequate:
In the future it will almost certainly be possible to build numpy in two different configurations: one where it expects to find BLAS inside a wheel distributed for this purpose (e.g. this is necessary to provide high-quality windows wheels), and one where it expects to find BLAS installed on the system. This decision will *not* be tied to the platform, but be selectable at build time. E.g., on OS X there is a system-provided BLAS library, but it has some issues. So the default wheels on PyPI will probably act like windows and depend on a BLAS-package that we control, but there will also be individual users who prefer to build numpy in the configuration where it uses the system BLAS, so we definitely need to support both options on OS X. Now the problem: There will never be a single environment marker that you can stick into a wheel or sdist that says "we depend on the 'pyblas' package if the system is OS X (ok) and the user set this flag in this configuration file during the build process (wait wut)".
Similarly, I think someone was saying in a discussion recently that lxml supports being built either in a mode where it requires libxml be available on the system, or else it can be statically linked. Even if in the future we start having metadata that lets us describe dependencies on external system libraries, it's never going to be the case that we can put the *same* dependency metadata into wheels that are built using these two configurations.
This is precisely the very complex issue that's being discussed under the banner of extending compatibility tags in a way that gives a viable but practical way of distinguishing binary wheels. You can either see that as a discussion about "expanding compatibility tags" or "finding something better than compatibility tags". I don't have much of a stake in that discussion, as the current compatibility tags suit my needs fine, as a Windows user. The issues seem to be around Linux and possibly some of the complexities around binary dependencies for numerical libraries. But the key point here is that I see the solution for this as being about distinguishing the "right" wheel for the target environment. It's not about anything that should reach back to sdists. Maybe a solution will involve a PEP 426 metadata enhancement that adds metadata that's only valid in binary distributions and not in source distributions, but that's fine by me. But it won't replace the existing dependency data, which *is* valid at the sdist level. At least as far as I can see - I'm willing to be enlightened. But your argument seems to be that sdist-level dependency information should be omitted because more detailed ABI compatibility data *might* be needed at the wheel level for some packages. I don't agree with that - we still need the existing metadata, even if more might be required in specialist cases.
[1] If extras and environment markers don't cover the needs of scientific modules, we need some input into their design from the scientific community. But again, let's not throw away the work that's already done.
As far as sdists go, you can either cover 90% of the cases by building increasingly elaborate metadata formats, or you can cover 100% of the cases by keeping things simple...
But your argument seems to be that having metadata generated from package build code is "simpler". My strong opinion, based on what I've seen of the problems caused by having metadata in an "exectable setup.py", is that static metadata is far simpler. I don't believe that the cost of changing to a new system can be justified *without* getting the benefits of static metadata. Paul