[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Nathaniel Smith njs at pobox.com
Sat Oct 3 03:06:32 CEST 2015

On Fri, Oct 2, 2015 at 3:52 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 2 October 2015 at 23:15, Nathaniel Smith <njs at pobox.com> wrote:
>> This situation is not common today for Python packages, but the only
>> reason for that is that distutils makes it really hard to do -- it's
>> extremely common in other package ecosystems, and the advantages are
>> obvious. E.g., maybe numpy.distutils should be split into a separately
>> installable package from numpy -- there's no technical reason that
>> this should mean we are now forced to move the code for it into its
>> own VCS repository.
> I'm lost here, I'm afraid. Could you rephrase this in terms of the
> definitions from the PUG glossary? It sounds to me like the VCS
> repository is the project, which contains multiple distributions. I
> don't see how that's particularly hard. Each distribution just has its
> own subdirectory (and setup.py) in the VCS repository...

The problem is that projects tend to release the whole project
together, rather than releasing individual subdirectories, and that
usually you can't just rip those subdirectories out of the parent
project and expect to build them on their own, because there's shared
infrastructure for build, configuration, static libraries with utility
code that get built once and then linked into each distribution...
having a VCS checkout = one wheel rule blocks a lot of otherwise
sensible project arrangements and forces awkward technical
workarounds. But if you allow one VCS checkout to produce multiple
wheels, then I can't see how you can avoid having one sdist produce
multiple wheels.

>> (I assume that by "platform tags" you mean what PEP 426 calls
>> "environment markers".)
> Nope, I mean as defined in PEP 425. The platform tag is part of the
> compatibility tag. Maybe I meant the ABI tag, I don't really follow
> the distinctions.
>> Environment markers are really useful for extending the set of cases
>> that can be handled by a single architecture-dependent wheel. And
>> they're a good fit for that environment, given that wheels can't
>> contain arbitrary code.
>> But they're certainly never going to be adequate to provide a single
>> static description of every possible build configuration of every
>> possible project. And installing an sdist already requires arbitrary
>> code execution, so it doesn't make sense to try to build some
>> elaborate system to avoid arbitrary code execution just for the
>> dependency specification.
>> You're right that in a perfect future world numpy C API related
>> dependencies would be handling by some separate ABI-tracking mechanism
>> similar to how the CPython ABI is tracked, so here are some other
>> examples of why environment markers are inadequate:
>> In the future it will almost certainly be possible to build numpy in
>> two different configurations: one where it expects to find BLAS inside
>> a wheel distributed for this purpose (e.g. this is necessary to
>> provide high-quality windows wheels), and one where it expects to find
>> BLAS installed on the system. This decision will *not* be tied to the
>> platform, but be selectable at build time. E.g., on OS X there is a
>> system-provided BLAS library, but it has some issues. So the default
>> wheels on PyPI will probably act like windows and depend on a
>> BLAS-package that we control, but there will also be individual users
>> who prefer to build numpy in the configuration where it uses the
>> system BLAS, so we definitely need to support both options on OS X.
>> Now the problem: There will never be a single environment marker that
>> you can stick into a wheel or sdist that says "we depend on the
>> 'pyblas' package if the system is OS X (ok) and the user set this flag
>> in this configuration file during the build process (wait wut)".
>> Similarly, I think someone was saying in a discussion recently that
>> lxml supports being built either in a mode where it requires libxml be
>> available on the system, or else it can be statically linked. Even if
>> in the future we start having metadata that lets us describe
>> dependencies on external system libraries, it's never going to be the
>> case that we can put the *same* dependency metadata into wheels that
>> are built using these two configurations.
> This is precisely the very complex issue that's being discussed under
> the banner of extending compatibility tags in a way that gives a
> viable but practical way of distinguishing binary wheels. You can
> either see that as a discussion about "expanding compatibility tags"
> or "finding something better than compatibility tags". I don't have
> much of a stake in that discussion, as the current compatibility tags
> suit my needs fine, as a Windows user. The issues seem to be around
> Linux and possibly some of the complexities around binary dependencies
> for numerical libraries.
> But the key point here is that I see the solution for this as being
> about distinguishing the "right" wheel for the target environment.
> It's not about anything that should reach back to sdists. Maybe a
> solution will involve a PEP 426 metadata enhancement that adds
> metadata that's only valid in binary distributions and not in source
> distributions, but that's fine by me. But it won't replace the
> existing dependency data, which *is* valid at the sdist level.

Okay. I think this is the key question for me: you want sdist's to
contain rich static metadata, presumably because you want to do
something with that metadata -- you say that not having it causes
problems. The obvious thing that pip might want to use that metadata
for is so that it can look at an sdist and know whether the wheel it
builds from that sdist will be useful in solving its current
dependency goals. But to answer this question, you actually do need to
know what the compatibility tag will be.

So: what problems does it solve for pip to get access to static
information about some, *but not all* of the eventual wheel's

(It's also the case that in the numpy example I gave above, it isn't
just the compatibility tag that can vary between wheels built from the
same sdist, it's really truly the actual runtime dependency metadata
that varies. But even if we ignore that, I think my question still


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list