[Distutils] distlib and wheel metadata

Nick Coghlan ncoghlan at gmail.com
Wed Feb 15 10:41:48 EST 2017


On 15 February 2017 at 15:11, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Feb 15, 2017 at 5:27 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> It's mainly a matter of incorporating
>> https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core
>> data model, as this distinction between abstract development
>> dependencies and concrete deployment dependencies is incredibly
>> important for any scenario that involves
>> publisher-redistributor-consumer chains, but is entirely non-obvious
>> to folks that are only familiar with the publisher-consumer case that
>> comes up during development-for-personal-and-open-source-use.
>
> Maybe I'm just being dense but, umm. I don't know what any of these
> words mean :-). I'm not unfamiliar with redistributors; part of my
> confusion is that this is a concept that AFAIK distro package systems
> don't have. Maybe it would help if you have a concrete example of a
> scenario where they would benefit from having this distinction?

It's about error messages and nudges in the UX: if PyPI rejects
version pinning in "requires" by default, then that creates an
opportunity to nudge people towards using "~=" or ">=" instead (as in
the vast majority of cases, that will be a better option than
pinning-for-publication).

The inclusion of "integrates" then adds back the support for
legitimate version pinning use cases in pydist.json in a way that
makes it clear that it is a conceptually distinct operation from a
normal dependency declaration.

>> pipenv borrows the Ruby solution to modeling this by having Pipfile
>> for abstract dependency declarations and Pipfile.lock for concrete
>> integration testing ones, so the idea here is to propagate that model
>> to pydist.json by separating the "requires" field with abstract
>> development dependencies from the "integrates" field with concrete
>> deployment dependencies.
>
> What's the benefit of putting this in pydist.json? I feel like for the
> usual deployment cases (a) going straight from Pipfile.lock -> venv is
> pretty much sufficient, with no need to put this into a package, but
> (b) if you really do want to put it into a package, then the natural
> approach would be to make an empty wheel like
> "my-django-app-deploy.whl" whose dependencies were the contents of
> Pipfile.lock.

My goal with the split is to get to a state where:

- exactly zero projects on PyPI use "==" or "===" in their requires
metadata (because PyPI explicitly prohibits it)
- the vast majority of projects on PyPI *don't* have an "integrates" section
- those projects that do have an `integrates` section have a valid
reason for it (like PyObjC)

For anyone making the transition from application and web service
development to library and framework development, the transition from
"always pin exact versions of your dependencies for deployment" to
"when publishing a library or framework, only rule out the
combinations that you're pretty sure *won't* work" is one of the
trickiest to deal with as current tools *don't alert you to the fact
that there's a difference to be learned*.

Restricting what can go into requires creates an opportunity to ask
users whether they're publishing a pre-integrated project or not: if
yes, then they add the "integrates" field and put their pinned
dependencies there; if not, then they relax the "==" constraints to
"~=" or ">=".

Either way, PyPI will believe your answer, it's just refusing the
temptation to guess that using "==" or "===" in the requires section
is sufficient to indicate that you're deliberately publishing a
pre-integrated project.

> There's certainly a distinction to be made between the abstract
> dependencies and the exact locked dependencies, but to me the natural
> way to model that distinction is by re-using the distinction we
> already have been source packages and binary packages. The build
> process for this placeholder wheel is to "compile down" the abstract
> dependencies into concrete dependencies, and the resulting wheel
> encodes the result of this compilation. Again, no new concepts needed.

Source vs binary isn't where the distinction applies, though. For
example, it's legitimate for PyObjC to have pinned dependencies even
when distributed in source form, as it's a metapackage used solely to
integrate the various PyObjC subprojects into a single "release".

>> In the vast majority of publication-to-PyPi cases people won't need
>> the "integrates" field, since what they're publishing on PyPI will
>> just be their abstract dependencies, and any warning against using
>> "==" will recommend using "~=" or ">=" instead. But there *are*
>> legitimate uses of pinning-for-publication (like the PyObjC
>> metapackage bundling all its subcomponents, or when building for
>> private deployment infastructure), so there needs to be a way to
>> represent "Yes, I'm pinning this dependency for publication, and I'm
>> aware of the significance of doing so"
>
> Why can't PyObjC just use regular dependencies? That's what distro
> metapackages have done for decades, right?

If PyObjC uses regular dependencies then there's no opportunity for
PyPI to ask "Did you really mean that?" when people pin dependencies
in "requires". That makes it likely we'll end up with a lot of
unnecessarily restrictive "==" constraints in PyPI packages ("Works on
my machine!"), which creates problems when attempting to auto-generate
distro packages from upstream ones.

The distro case isn't directly analagous, since there are a few key differences:

- open publication platform rather than a pre-approved set of package
maintainers
- no documented packaging policies with related human review &
approval processes
- a couple of orders magnitude difference in the number of packages involved
- at least in RPM, you can have a spec file with no source tarball,
which makes it obvious it's a metapackage

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list