humpty in term uses uses distlib which seems to mishandle wheel> metadata. (For example, it chokes if there's extra distribution meta and makes it impossible for buildout to install python-dateutil from a wheel.)
I looked into the "mishandling". It's that the other tools don't adhere to [the current state of] PEP 426 as closely as distlib does. For example, wheel writes JSON metadata to metadata.json in the .dist-info directory, whereas PEP 426 calls for that data to be in pydist.json. The non-JSON metadata in the wheel (the METADATA file) does not strictly adhere to any of the metadata PEPs 241, 314, 345 or 426 (it has a mixture of incompatible fields). I can change distlib to look for metadata.json, and relax the rules to be more liberal regarding which fields to accept, but adhering to the PEP isn't mishandling things, as I see it. Work on distlib has slowed right down since around the time when PEP 426 was deferred indefinitely, and there seems to be little interest in progressing via metadata or other standardisation - we have to go by what the de facto tools (setuptools, wheel) choose to do. It's not an ideal situation, and incompatibilities can crop up, as you've seen. Regards, Vinay Sajip
On Tue, Feb 14, 2017 at 10:10 AM, Vinay Sajip via Distutils-SIG distutils-sig@python.org wrote:
humpty in term uses uses distlib which seems to mishandle wheel metadata. (For example, it chokes if there's extra distribution meta and makes it impossible for buildout to install python-dateutil from a wheel.)
I looked into the "mishandling". It's that the other tools don't adhere to [the current state of] PEP 426 as closely as distlib does. For example, wheel writes JSON metadata to metadata.json in the .dist-info directory, whereas PEP 426 calls for that data to be in pydist.json. The non-JSON metadata in the wheel (the METADATA file) does not strictly adhere to any of the metadata PEPs 241, 314, 345 or 426 (it has a mixture of incompatible fields).
I can change distlib to look for metadata.json, and relax the rules to be more liberal regarding which fields to accept, but adhering to the PEP isn't mishandling things, as I see it.
I thought the current status was that it's called metadata.json exactly *because* it's not standardized, and you *shouldn't* look at it?
It's too bad that the JSON thing didn't work out, but I think we're better off working on better specifying the one source of truth everything already uses (METADATA) instead of bringing in *new* partially-incompatible-and-poorly-specified formats.
-n
I would accept a pull request to stop generating metadata.json in bdist_wheel.
On Tue, Feb 14, 2017 at 1:16 PM Nathaniel Smith njs@pobox.com wrote:
On Tue, Feb 14, 2017 at 10:10 AM, Vinay Sajip via Distutils-SIG distutils-sig@python.org wrote:
humpty in term uses uses distlib which seems to mishandle wheel metadata. (For example, it chokes if there's extra distribution meta and makes it impossible for buildout to install python-dateutil from a
wheel.)
I looked into the "mishandling". It's that the other tools don't adhere
to
[the current state of] PEP 426 as closely as distlib does. For example, wheel writes JSON metadata to metadata.json in the .dist-info directory, whereas PEP 426 calls for that data to be in pydist.json. The non-JSON metadata in the wheel (the METADATA file) does not strictly adhere to
any of
the metadata PEPs 241, 314, 345 or 426 (it has a mixture of incompatible fields).
I can change distlib to look for metadata.json, and relax the rules to be more liberal regarding which fields to accept, but adhering to the PEP
isn't
mishandling things, as I see it.
I thought the current status was that it's called metadata.json exactly *because* it's not standardized, and you *shouldn't* look at it?
It's too bad that the JSON thing didn't work out, but I think we're better off working on better specifying the one source of truth everything already uses (METADATA) instead of bringing in *new* partially-incompatible-and-poorly-specified formats.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
I thought the current status was that it's called metadata.json exactly *because* it's not standardized, and you *shouldn't* look at it?
Well, it was work-in-progress-standardised according to PEP 426 (since sometimes implementations have to work in parallel with working out the details of specifications). Given that PEP 426 wasn't done and dusted but being progressed, I would have thought it perfectly acceptable to use "pydist.json", as the only things that would be affected would be packaging tools working to the PEP.
It's too bad that the JSON thing didn't work out, but I think we're better off working on better specifying the one source of truth everything already uses (METADATA) instead of bringing in *new* partially-incompatible-and-poorly-specified formats.
When you say "everything already uses", do you mean setuptools and wheel? If nobody else is allowed to play, that's one thing. But otherwise, there need to be standards for interoperability. The METADATA file, now - exactly which standard does it follow? The one in the dateutil wheel that Jim referred to doesn't appear to conform to any of the metadata PEPs. It was rejected by old metadata code in distlib (which came of out the Python 3.3 era "packaging" package - not to be confused with Donald's of the same name - which is strict in its interpretation of those earlier PEPs).
The METADATA format (key-value) is not really flexible enough for certain things which were in PEP 426 (e.g. dependency descriptions), and for these JSON seems a reasonable fit. There's no technical reason why "the JSON thing didn't work out", as far as I can see - it was just given up on for a more incremental approach (which has got no new PEPs other than 440, AFAICT). I understand that social reasons are often more important than technical reasons when it comes to success or failure of an approach; I'm just not sure that in this case, it wasn't given up on too early.
Regards,
Vinay Sajip
On 14 February 2017 at 21:21, Vinay Sajip via Distutils-SIG distutils-sig@python.org wrote:
I thought the current status was that it's called metadata.json exactly *because* it's not standardized, and you *shouldn't* look at it?
Well, it was work-in-progress-standardised according to PEP 426 (since sometimes implementations have to work in parallel with working out the details of specifications). Given that PEP 426 wasn't done and dusted but being progressed, I would have thought it perfectly acceptable to use "pydist.json", as the only things that would be affected would be packaging tools working to the PEP.
I asked Daniel to *stop* using pydist.json, since wheel was emitting a point-in-time snapshot of PEP 426 (which includes a lot of potentially-nice-to-have things that nobody has actually implemented so far, like the semantic dependency declarations and the enhancements to the extras syntax), rather than the final version of the spec.
It's too bad that the JSON thing didn't work out, but I think we're better off working on better specifying the one source of truth everything already uses (METADATA) instead of bringing in *new* partially-incompatible-and-poorly-specified formats.
When you say "everything already uses", do you mean setuptools and wheel? If nobody else is allowed to play, that's one thing. But otherwise, there need to be standards for interoperability. The METADATA file, now - exactly which standard does it follow? The one in the dateutil wheel that Jim referred to doesn't appear to conform to any of the metadata PEPs. It was rejected by old metadata code in distlib (which came of out the Python 3.3 era "packaging" package - not to be confused with Donald's of the same name - which is strict in its interpretation of those earlier PEPs).
The METADATA format (key-value) is not really flexible enough for certain things which were in PEP 426 (e.g. dependency descriptions), and for these JSON seems a reasonable fit.
The current de facto standard set by setuptools and bdist_wheel is:
- dist-info/METADATA as defined at https://packaging.python.org/specifications/#package-distribution-metadata - dist-info/requires.txt runtime dependencies as defined at http://setuptools.readthedocs.io/en/latest/formats.html#requires-txt - dist-info/setup_requires.txt build time dependencies as defined at http://setuptools.readthedocs.io/en/latest/formats.html#setup-requires-txt
The dependency fields in METADATA itself unfortunately aren't really useful for anything.
There's definitely still a place for a pydist.json created by going through PEP 426, comparing it to what bdist_wheel already does to populate metadata.json, and either changing the PEP to match the existing practice, or else agreeing that we prefer what the PEP recommends, that we want to move in that direction, and that there's a definite commitment to implement the changes in at least setuptools and bdist_wheel (plus a migration strategy that allows for reasonably sensible consumption of old metadata).
Such an update would necessarily be a fairly ruthless process, where we defer everything that can possibly be deferred. I already made one pass at that when I split out the metadata extensions into PEP 459, but at least one more such pass is needed before we can sign off on the spec as metadata 2.0 - even beyond any "open for discussion" questions, there are still things in there which were extracted and standardised separately in PEP 508.
There's no technical reason why "the JSON thing didn't work out", as far as I can see - it was just given up on for a more incremental approach (which has got no new PEPs other than 440, AFAICT).
Yep, it's a logistical problem rather than a technical problem per se - new metadata formats need software publisher adoption to ensure the design is sensible before we commit to them long term, but software publishers are understandably reluctant to rely on new formats that limit their target audience to folks running the latest versions of the installation tools (outside constrained cases where the software publisher is also the main consumer of that software).
For PEP 440 (version specifiers) and PEP 508 (dependency specifiers), this was handled by focusing on documenting practices that people already used (and checking existing PyPI projects for compatibility), rather than trying to actively change those practices.
For pyproject.toml (e.g. enscons), the idea is to provide a setup.py shim that can take care of bootstrapping the new approach for the benefit of older tools that assume the use of setup.py (similar to what was done with setup.cfg and d2to1).
The equivalent for PEP 426 would probably be legacy-to-pydist and pydist-to-legacy converters that setuptools, bdist_wheel and other publishing tools can use to ship legacy metadata alongside the standardised format (and I believe Daniel already has at least the former in order to generate metadata.json in bdist_wheel). With PEP 426 as currently written, a pydist-to-legacy converter isn't really feasible, since pydist proposes new concepts that can't be readily represented in the old format.
I understand that social reasons are often more important than technical reasons when it comes to success or failure of an approach; I'm just not sure that in this case, it wasn't given up on too early.
I think of PEP 426 as "deferred indefinitely pending specific practical problems to provide clearer design constraints" rather than abandoned :)
There are two recent developments that I think may provide those missing design constraints and hence motivation to finalise a metadata 2.0 specification:
1. the wheel-to-egg support in humpty (and hence zc.buiidout). That makes humpty a concrete non-traditional installer that would benefit from both a modernised standard metadata format, as well as common tools both to convert legacy metadata to the agreed modern format and to convert the modern format back to the legacy format for inclusion in the generated egg files (as then humpty could just re-use the shared tools, rather than having to maintain those capabilities itself).
2. the new pipenv project to provide a simpler alternative to the pip+virtualenv+pip-tools combination for environment management in web service development (and similar layered application architectures). As with the "install vs setup" split in setuptools, pipenv settled on an "only two kinds of requirement (deployment and development)" model for usability reasons, but it also distinguishes abstract dependencies stored in Pipfile from pinned concrete dependencies stored in Pipfile.lock.
If we put those together with the existing interest in automating generation of policy compliant operating system distribution packages, it makes it easier to go through the proposed semantic dependency model in PEP 426 and ask "How would we populate these fields based on the metadata that projects *already* publish?".
- "run requires": straightforward, as these are the standard dependencies used in most projects. Not entirely clear how to gently (or strongly!) discourage dependency pinning when publishing to PyPI (although the Pipfile and Pipfile.lock model used in pipenv may help with this) - "meta requires": not clear at all, as this was added to handle cases like PyObjC, where the main package is just a metapackage that makes a particular set of versioned subpackages easy to install. This may be better modeled as a separate "integrates" field, using a declaration syntax more akin to that used for Pipfile.lock rather than that used for normal requirements declarations. - "dev requires": corresponds to "dev-packages" in pipenv - "build requires": corresponds to "setup_requires" in setuptools, "build-system.requires" + any dynamic build dependencies in PEP 518 - "test requires": corresponds to "test" extra in https://packaging.python.org/specifications/#provides-extra-multiple-use
The "doc" extra in https://packaging.python.org/specifications/#provides-extra-multiple-use would map to "build requires", but there's potential benefit to redistributors in separating it out, as we often split the docs out from the built software components (since there's little reason to install documentation on headless servers that are only going to be debugged remotely).
The main argument against "test requires" and "doc requires" is that the extras system already works fine for those - "pip install MyProject[test]" and "pip install MyProject[doc]" are both already supported, so metadata 2.0 just needs to continue to reserve those as semantically significant extras names.
"dev" requires could be handled the same way - anything you actually need to *build* an sdist or wheel archive from a source repository should be in "setup_requires" (setuptools) or "build-system.requires" (pyproject.toml), so "dev" would just be a conventional extra name rather than a top level field.
That just leaves "build_requires", which turns out to interact awkwardly with the "extras" system: if you write "pip install MyProject[test]", does it install all the "test" dependencies, regardless of whether they're listed in run_requires or build_requires?
If yes: then why are run_requires and build_requires separate? If no: then how do you request installation of the "test" build extra? Or are build extras prohibited entirely?
That suggests that perhaps "build" should just be a conventional extra as well, and considered orthogonal to the other conventional extras. (I'm sure this idea has been suggested before, but I don't recall who suggested it or when)
And if build, test, doc, and dev are all handled as extras, then the top level name "run_requires" no longer makes sense, and the field name should go back to just being "requires".
Under that evaluation, we'd be left with only the following top level fields defined for dependency declarations:
- "requires": list where entries are either a string containing a PEP 508 dependency specifier or else a hash map contain a "requires" key plus "extra" or "environment" fields as qualifiers - "integrates": replacement for "meta_requires" that only allows pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
For converting old metadata, any concrete dependencies that are compatible with the "integrates" field format would be mapped that way, while everything else would be converted to "requires" entries. The semantic differences between normal runtime dependencies and "dev", "test", "doc" and "build" requirements would be handled as extras, regardless of whether you were using the old metadata format or the new one.
Going the other direction would be similarly straightforward since (excluding extensions) the set of required conceptual entities has been reduced back to the set that already exists in the current metadata formats. While "requires" and "integrates" would be distinct fields in pydist.json, the decomposed fields in the latter would map back to their string-based counterparts in PEP 508 when converted to the legacy metadata formats.
Cheers, Nick.
P.S. I'm definitely open to a PR that amends the PEP 426 draft along these lines. I'll get to it eventually myself, but there are some other things I see as higher priority for my open source time at the moment (specifically the C locale handling behaviour of Python 3.6 in Fedora 26 and the related upstream proposal for Python 3.7 in PEP 538)
On Wed, Feb 15, 2017 at 3:33 AM, Nick Coghlan ncoghlan@gmail.com wrote:
- "requires": list where entries are either a string containing a PEP
508 dependency specifier or else a hash map contain a "requires" key plus "extra" or "environment" fields as qualifiers
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
What's accomplished by separating these? I really think we should strive to have fewer more orthogonal concepts whenever possible...
-n
On 15 February 2017 at 12:58, Nathaniel Smith njs@pobox.com wrote:
On Wed, Feb 15, 2017 at 3:33 AM, Nick Coghlan ncoghlan@gmail.com wrote:
- "requires": list where entries are either a string containing a PEP
508 dependency specifier or else a hash map contain a "requires" key plus "extra" or "environment" fields as qualifiers
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
What's accomplished by separating these? I really think we should strive to have fewer more orthogonal concepts whenever possible...
It's mainly a matter of incorporating https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core data model, as this distinction between abstract development dependencies and concrete deployment dependencies is incredibly important for any scenario that involves publisher-redistributor-consumer chains, but is entirely non-obvious to folks that are only familiar with the publisher-consumer case that comes up during development-for-personal-and-open-source-use.
One particular area where this is problematic is in the widespread advice "always pin your dependencies" which is usually presented without the all important "for application or service deployment" qualifier. As a first approximation: pinning-for-app-or-service-deployment == good, pinning-for-local-testing == good, pinning-for-library-or-framework-publication-to-PyPI == bad.
pipenv borrows the Ruby solution to modeling this by having Pipfile for abstract dependency declarations and Pipfile.lock for concrete integration testing ones, so the idea here is to propagate that model to pydist.json by separating the "requires" field with abstract development dependencies from the "integrates" field with concrete deployment dependencies.
In the vast majority of publication-to-PyPi cases people won't need the "integrates" field, since what they're publishing on PyPI will just be their abstract dependencies, and any warning against using "==" will recommend using "~=" or ">=" instead. But there *are* legitimate uses of pinning-for-publication (like the PyObjC metapackage bundling all its subcomponents, or when building for private deployment infastructure), so there needs to be a way to represent "Yes, I'm pinning this dependency for publication, and I'm aware of the significance of doing so"
Cheers, Nick.
On Wed, Feb 15, 2017 at 5:27 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 15 February 2017 at 12:58, Nathaniel Smith njs@pobox.com wrote:
On Wed, Feb 15, 2017 at 3:33 AM, Nick Coghlan ncoghlan@gmail.com wrote:
- "requires": list where entries are either a string containing a PEP
508 dependency specifier or else a hash map contain a "requires" key plus "extra" or "environment" fields as qualifiers
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
What's accomplished by separating these? I really think we should strive to have fewer more orthogonal concepts whenever possible...
It's mainly a matter of incorporating https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core data model, as this distinction between abstract development dependencies and concrete deployment dependencies is incredibly important for any scenario that involves publisher-redistributor-consumer chains, but is entirely non-obvious to folks that are only familiar with the publisher-consumer case that comes up during development-for-personal-and-open-source-use.
Maybe I'm just being dense but, umm. I don't know what any of these words mean :-). I'm not unfamiliar with redistributors; part of my confusion is that this is a concept that AFAIK distro package systems don't have. Maybe it would help if you have a concrete example of a scenario where they would benefit from having this distinction?
One particular area where this is problematic is in the widespread advice "always pin your dependencies" which is usually presented without the all important "for application or service deployment" qualifier. As a first approximation: pinning-for-app-or-service-deployment == good, pinning-for-local-testing == good, pinning-for-library-or-framework-publication-to-PyPI == bad.
pipenv borrows the Ruby solution to modeling this by having Pipfile for abstract dependency declarations and Pipfile.lock for concrete integration testing ones, so the idea here is to propagate that model to pydist.json by separating the "requires" field with abstract development dependencies from the "integrates" field with concrete deployment dependencies.
What's the benefit of putting this in pydist.json? I feel like for the usual deployment cases (a) going straight from Pipfile.lock -> venv is pretty much sufficient, with no need to put this into a package, but (b) if you really do want to put it into a package, then the natural approach would be to make an empty wheel like "my-django-app-deploy.whl" whose dependencies were the contents of Pipfile.lock.
There's certainly a distinction to be made between the abstract dependencies and the exact locked dependencies, but to me the natural way to model that distinction is by re-using the distinction we already have been source packages and binary packages. The build process for this placeholder wheel is to "compile down" the abstract dependencies into concrete dependencies, and the resulting wheel encodes the result of this compilation. Again, no new concepts needed.
In the vast majority of publication-to-PyPi cases people won't need the "integrates" field, since what they're publishing on PyPI will just be their abstract dependencies, and any warning against using "==" will recommend using "~=" or ">=" instead. But there *are* legitimate uses of pinning-for-publication (like the PyObjC metapackage bundling all its subcomponents, or when building for private deployment infastructure), so there needs to be a way to represent "Yes, I'm pinning this dependency for publication, and I'm aware of the significance of doing so"
Why can't PyObjC just use regular dependencies? That's what distro metapackages have done for decades, right?
-n
On 15 February 2017 at 14:11, Nathaniel Smith njs@pobox.com wrote:
It's mainly a matter of incorporating https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core data model, as this distinction between abstract development dependencies and concrete deployment dependencies is incredibly important for any scenario that involves publisher-redistributor-consumer chains, but is entirely non-obvious to folks that are only familiar with the publisher-consumer case that comes up during development-for-personal-and-open-source-use.
Maybe I'm just being dense but, umm. I don't know what any of these words mean :-). I'm not unfamiliar with redistributors; part of my confusion is that this is a concept that AFAIK distro package systems don't have. Maybe it would help if you have a concrete example of a scenario where they would benefit from having this distinction?
I'm also finding this discussion bafflingly complex. I understand that distributions need a way to work with Python packages, but the issues involved seem completely divorced from the basic process of a user using pip to install a package with the dependencies it needs to work in their program.
The package metadata standardisation process seems to be falling foul of a quest for perfection. Is there no 80% solution that covers the bulk of use cases (which, in my mind, are all around some user wanting to say "pip install" to grab some stuff off PyPI to build his project)? Or is the 80% solution precisely what we have at the moment, in which case can't we standardise what we have, and *then* look to extend to cover the additional requirements?
I'm sure I'm missing something - but honestly, I'm not sure what it is.
If I write something to go on PyPI, I assume that makes me a "publisher"? IMO, my audience is people who use my software (the "consumers" in your terms, I guess). While I'd be pleased if a distributor like Ubuntu or Fedora or Anaconda wanted to include my package in their distribution, I wouldn't see them as my end users - so while I'd be OK with tweaking my code/metadata to accommodate their needs, it's not a key goal for me. And learning all the metadata concepts related to packaging my project for distributors wouldn't appeal to me at all. I'd be happy for the distributions to to that and send me PRs, but the burden should be on them to do that. The complexities we're debating here seem to be based on the idea that *I* should understand the distributor's role in order to package my code "correctly". I'm not at all sure I agree with that.
Maybe this is all a consequence of Python now being used in "big business", and the old individual developer scratching his or her own itch model is gone. And maybe that means PyPI is no longer a suitable place for such "use at your own risk" code But if that's the case, maybe we need to acknowledge that fact, before we end up with people getting the idea that "Python packaging is too complex for the average developer". Because it's starting to feel that way :-(
Paul
On 15 February 2017 at 15:58, Paul Moore p.f.moore@gmail.com wrote:
On 15 February 2017 at 14:11, Nathaniel Smith njs@pobox.com wrote:
It's mainly a matter of incorporating https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core data model, as this distinction between abstract development dependencies and concrete deployment dependencies is incredibly important for any scenario that involves publisher-redistributor-consumer chains, but is entirely non-obvious to folks that are only familiar with the publisher-consumer case that comes up during development-for-personal-and-open-source-use.
Maybe I'm just being dense but, umm. I don't know what any of these words mean :-). I'm not unfamiliar with redistributors; part of my confusion is that this is a concept that AFAIK distro package systems don't have. Maybe it would help if you have a concrete example of a scenario where they would benefit from having this distinction?
I'm also finding this discussion bafflingly complex. I understand that distributions need a way to work with Python packages, but the issues involved seem completely divorced from the basic process of a user using pip to install a package with the dependencies it needs to work in their program.
As simple as I can make it:
* pinning dependencies when publishing to PyPI is presumptively bad * PyPI itself (not client tools) should warn you that it's a bad idea * however, there are legitimate use cases for pinning in PyPI packages * so there should be a way to do it, but it should involve telling PyPI "I am an integration project, this is OK"
Most people should never touch the "integrates" field, they should just change "==" to "~=" or ">=" to allow for future releases of their dependencies.
Cheers, Nick.
On 15 February 2017 at 15:11, Nathaniel Smith njs@pobox.com wrote:
On Wed, Feb 15, 2017 at 5:27 AM, Nick Coghlan ncoghlan@gmail.com wrote:
It's mainly a matter of incorporating https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core data model, as this distinction between abstract development dependencies and concrete deployment dependencies is incredibly important for any scenario that involves publisher-redistributor-consumer chains, but is entirely non-obvious to folks that are only familiar with the publisher-consumer case that comes up during development-for-personal-and-open-source-use.
Maybe I'm just being dense but, umm. I don't know what any of these words mean :-). I'm not unfamiliar with redistributors; part of my confusion is that this is a concept that AFAIK distro package systems don't have. Maybe it would help if you have a concrete example of a scenario where they would benefit from having this distinction?
It's about error messages and nudges in the UX: if PyPI rejects version pinning in "requires" by default, then that creates an opportunity to nudge people towards using "~=" or ">=" instead (as in the vast majority of cases, that will be a better option than pinning-for-publication).
The inclusion of "integrates" then adds back the support for legitimate version pinning use cases in pydist.json in a way that makes it clear that it is a conceptually distinct operation from a normal dependency declaration.
pipenv borrows the Ruby solution to modeling this by having Pipfile for abstract dependency declarations and Pipfile.lock for concrete integration testing ones, so the idea here is to propagate that model to pydist.json by separating the "requires" field with abstract development dependencies from the "integrates" field with concrete deployment dependencies.
What's the benefit of putting this in pydist.json? I feel like for the usual deployment cases (a) going straight from Pipfile.lock -> venv is pretty much sufficient, with no need to put this into a package, but (b) if you really do want to put it into a package, then the natural approach would be to make an empty wheel like "my-django-app-deploy.whl" whose dependencies were the contents of Pipfile.lock.
My goal with the split is to get to a state where:
- exactly zero projects on PyPI use "==" or "===" in their requires metadata (because PyPI explicitly prohibits it) - the vast majority of projects on PyPI *don't* have an "integrates" section - those projects that do have an `integrates` section have a valid reason for it (like PyObjC)
For anyone making the transition from application and web service development to library and framework development, the transition from "always pin exact versions of your dependencies for deployment" to "when publishing a library or framework, only rule out the combinations that you're pretty sure *won't* work" is one of the trickiest to deal with as current tools *don't alert you to the fact that there's a difference to be learned*.
Restricting what can go into requires creates an opportunity to ask users whether they're publishing a pre-integrated project or not: if yes, then they add the "integrates" field and put their pinned dependencies there; if not, then they relax the "==" constraints to "~=" or ">=".
Either way, PyPI will believe your answer, it's just refusing the temptation to guess that using "==" or "===" in the requires section is sufficient to indicate that you're deliberately publishing a pre-integrated project.
There's certainly a distinction to be made between the abstract dependencies and the exact locked dependencies, but to me the natural way to model that distinction is by re-using the distinction we already have been source packages and binary packages. The build process for this placeholder wheel is to "compile down" the abstract dependencies into concrete dependencies, and the resulting wheel encodes the result of this compilation. Again, no new concepts needed.
Source vs binary isn't where the distinction applies, though. For example, it's legitimate for PyObjC to have pinned dependencies even when distributed in source form, as it's a metapackage used solely to integrate the various PyObjC subprojects into a single "release".
In the vast majority of publication-to-PyPi cases people won't need the "integrates" field, since what they're publishing on PyPI will just be their abstract dependencies, and any warning against using "==" will recommend using "~=" or ">=" instead. But there *are* legitimate uses of pinning-for-publication (like the PyObjC metapackage bundling all its subcomponents, or when building for private deployment infastructure), so there needs to be a way to represent "Yes, I'm pinning this dependency for publication, and I'm aware of the significance of doing so"
Why can't PyObjC just use regular dependencies? That's what distro metapackages have done for decades, right?
If PyObjC uses regular dependencies then there's no opportunity for PyPI to ask "Did you really mean that?" when people pin dependencies in "requires". That makes it likely we'll end up with a lot of unnecessarily restrictive "==" constraints in PyPI packages ("Works on my machine!"), which creates problems when attempting to auto-generate distro packages from upstream ones.
The distro case isn't directly analagous, since there are a few key differences:
- open publication platform rather than a pre-approved set of package maintainers - no documented packaging policies with related human review & approval processes - a couple of orders magnitude difference in the number of packages involved - at least in RPM, you can have a spec file with no source tarball, which makes it obvious it's a metapackage
Cheers, Nick.
On 15 February 2017 at 15:41, Nick Coghlan ncoghlan@gmail.com wrote:
My goal with the split is to get to a state where:
- exactly zero projects on PyPI use "==" or "===" in their requires
metadata (because PyPI explicitly prohibits it)
- the vast majority of projects on PyPI *don't* have an "integrates" section
- those projects that do have an `integrates` section have a valid
reason for it (like PyObjC)
So how many projects on PyPI currently have == or === in their requires? I've never seen one (although my sample size isn't large - but it does cover major packages in a large-ish range of application areas).
I'm curious as to how major this problem is in practice. I (now) understand the theoretical argument for the proposal.
Paul
On Feb 15, 2017 07:41, "Nick Coghlan" ncoghlan@gmail.com wrote:
pipenv borrows the Ruby solution to modeling this by having Pipfile for abstract dependency declarations and Pipfile.lock for concrete integration testing ones, so the idea here is to propagate that model to pydist.json by separating the "requires" field with abstract development dependencies from the "integrates" field with concrete deployment dependencies.
What's the benefit of putting this in pydist.json? I feel like for the usual deployment cases (a) going straight from Pipfile.lock -> venv is pretty much sufficient, with no need to put this into a package, but (b) if you really do want to put it into a package, then the natural approach would be to make an empty wheel like "my-django-app-deploy.whl" whose dependencies were the contents of Pipfile.lock.
My goal with the split is to get to a state where:
- exactly zero projects on PyPI use "==" or "===" in their requires metadata (because PyPI explicitly prohibits it) - the vast majority of projects on PyPI *don't* have an "integrates" section - those projects that do have an `integrates` section have a valid reason for it (like PyObjC)
For anyone making the transition from application and web service development to library and framework development, the transition from "always pin exact versions of your dependencies for deployment" to "when publishing a library or framework, only rule out the combinations that you're pretty sure *won't* work" is one of the trickiest to deal with as current tools *don't alert you to the fact that there's a difference to be learned*.
Restricting what can go into requires creates an opportunity to ask users whether they're publishing a pre-integrated project or not: if yes, then they add the "integrates" field and put their pinned dependencies there; if not, then they relax the "==" constraints to "~=" or ">=".
Ah-hah, this does make sense as a problem, thanks!
However, your solution seems very odd to me :-).
If the goal is to put an "are you sure/yes I'm sure" UX barrier between users and certain version settings, then why make a distinction that every piece of downstream software has to be aware of and ignore? Pypi seems like a funny place in the stack to be implementing this. It would be much simpler to implement this feature at the build system level, like e.g. setuptools could require that dependencies that you think are over strict be specified in an install_requires_yes_i_really_mean_it= field, without requiring any metadata changes.
Basically it sounds like you're saying you want to extend the metadata so that it can represent both broken and non-broken packages, so that both can be created, passed around, and checked for. And I'm saying, how about instead we do that checking when creating the package in the first place.
(Of course I can't see any way to do any of this that won't break existing sdists, but I guess you've already decided you're OK with that. I guess I should say that I'm a bit dubious that this is so important in the first place; I feel like there are lots of legitimate use cases for == dependencies and lots of kinds of linting we might want to apply to try and improve the level of packaging quality.)
Either way, PyPI will believe your answer, it's just refusing the temptation to guess that using "==" or "===" in the requires section is sufficient to indicate that you're deliberately publishing a pre-integrated project.
There's certainly a distinction to be made between the abstract dependencies and the exact locked dependencies, but to me the natural way to model that distinction is by re-using the distinction we already have been source packages and binary packages. The build process for this placeholder wheel is to "compile down" the abstract dependencies into concrete dependencies, and the resulting wheel encodes the result of this compilation. Again, no new concepts needed.
Source vs binary isn't where the distinction applies, though. For example, it's legitimate for PyObjC to have pinned dependencies even when distributed in source form, as it's a metapackage used solely to integrate the various PyObjC subprojects into a single "release".
?? So that means that some packages have a loosely specified source that compiles down to a more strictly specified binary, and some have a more strictly specified source that compiles down to an equally strictly specified binary. That's... an argument in favor of my way of thinking about it, isn't it? That it can naturally express both situations?
My point is that *for the cases where there's an important distinction between Pipfile and Pipfile.lock*, we already have a way to think about that distinction without introducing new concepts.
-n
On 15 Feb 2017 23:40, "Nathaniel Smith" njs@pobox.com wrote:
On Feb 15, 2017 07:41, "Nick Coghlan" ncoghlan@gmail.com wrote:
Ah-hah, this does make sense as a problem, thanks!
However, your solution seems very odd to me :-).
If the goal is to put an "are you sure/yes I'm sure" UX barrier between users and certain version settings, then why make a distinction that every piece of downstream software has to be aware of and ignore? Pypi seems like a funny place in the stack to be implementing this. It would be much simpler to implement this feature at the build system level, like e.g. setuptools could require that dependencies that you think are over strict be specified in an install_requires_yes_i_really_mean_it= field, without requiring any metadata changes.
If you're publishing to a *private* index server then version pinning should be allowed by default and you shouldn't get a warning.
It's only when publishing to PyPI as a *public* index server that overly restrictive dependencies become a UX problem.
The simplest way of modelling this that I've come up with is a boolean "allow pinned dependencies" flag - without the flag, "==" and "===" would emit warnings or errors when releasing to a public index server, with it they wouldn't trigger any complaints.
Basically it sounds like you're saying you want to extend the metadata so that it can represent both broken and non-broken packages, so that both can be created, passed around, and checked for. And I'm saying, how about instead we do that checking when creating the package in the first place.
Build time isn't right, due to this being a perfectly acceptable thing to do when building solely for private use. It's only you make the "I'm going to publish this for the entire community to use" that the intent needs to be clarified (as at that point you're switching from "I'm solving to my own problems" to "My problems may be shared by other people, and I'd like to help them out if I can").
(Of course I can't see any way to do any of this that won't break existing sdists, but I guess you've already decided you're OK with that. I guess I should say that I'm a bit dubious that this is so important in the first place; I feel like there are lots of legitimate use cases for == dependencies and lots of kinds of linting we might want to apply to try and improve the level of packaging quality.)
Either way, PyPI will believe your answer, it's just refusing the temptation to guess that using "==" or "===" in the requires section is sufficient to indicate that you're deliberately publishing a pre-integrated project.
There's certainly a distinction to be made between the abstract dependencies and the exact locked dependencies, but to me the natural way to model that distinction is by re-using the distinction we already have been source packages and binary packages. The build process for this placeholder wheel is to "compile down" the abstract dependencies into concrete dependencies, and the resulting wheel encodes the result of this compilation. Again, no new concepts needed.
Source vs binary isn't where the distinction applies, though. For example, it's legitimate for PyObjC to have pinned dependencies even when distributed in source form, as it's a metapackage used solely to integrate the various PyObjC subprojects into a single "release".
?? So that means that some packages have a loosely specified source that compiles down to a more strictly specified binary, and some have a more strictly specified source that compiles down to an equally strictly specified binary. That's... an argument in favor of my way of thinking about it, isn't it? That it can naturally express both situations?
My point is that *for the cases where there's an important distinction between Pipfile and Pipfile.lock*, we already have a way to think about that distinction without introducing new concepts.
-n
On 23 February 2017 at 18:03, Nick Coghlan ncoghlan@gmail.com wrote:
On 15 Feb 2017 23:40, "Nathaniel Smith" njs@pobox.com wrote:
On Feb 15, 2017 07:41, "Nick Coghlan" ncoghlan@gmail.com wrote:
Ah-hah, this does make sense as a problem, thanks!
However, your solution seems very odd to me :-).
If the goal is to put an "are you sure/yes I'm sure" UX barrier between users and certain version settings, then why make a distinction that every piece of downstream software has to be aware of and ignore? Pypi seems like a funny place in the stack to be implementing this. It would be much simpler to implement this feature at the build system level, like e.g. setuptools could require that dependencies that you think are over strict be specified in an install_requires_yes_i_really_mean_it= field, without requiring any metadata changes.
If you're publishing to a *private* index server then version pinning should be allowed by default and you shouldn't get a warning.
It's only when publishing to PyPI as a *public* index server that overly restrictive dependencies become a UX problem.
The simplest way of modelling this that I've come up with is a boolean "allow pinned dependencies" flag - without the flag, "==" and "===" would emit warnings or errors when releasing to a public index server, with it they wouldn't trigger any complaints.
Basically it sounds like you're saying you want to extend the metadata so that it can represent both broken and non-broken packages, so that both can be created, passed around, and checked for. And I'm saying, how about instead we do that checking when creating the package in the first place.
Build time isn't right, due to this being a perfectly acceptable thing to do when building solely for private use. It's only you make the "I'm going to publish this for the entire community to use" that the intent needs to be clarified (as at that point you're switching from "I'm solving to my own problems" to "My problems may be shared by other people, and I'd like to help them out if I can").
And TIL that Ctrl-Enter is Gmail's keyboard shortcut for sending an email :)
(Of course I can't see any way to do any of this that won't break existing sdists, but I guess you've already decided you're OK with that. I guess I should say that I'm a bit dubious that this is so important in the first place; I feel like there are lots of legitimate use cases for == dependencies and lots of kinds of linting we might want to apply to try and improve the level of packaging quality.)
Existing sdists won't have pydist.json, so none of this will apply.
Either way, PyPI will believe your answer, it's just refusing the temptation to guess that using "==" or "===" in the requires section is sufficient to indicate that you're deliberately publishing a pre-integrated project.
There's certainly a distinction to be made between the abstract dependencies and the exact locked dependencies, but to me the natural way to model that distinction is by re-using the distinction we already have been source packages and binary packages. The build process for this placeholder wheel is to "compile down" the abstract dependencies into concrete dependencies, and the resulting wheel encodes the result of this compilation. Again, no new concepts needed.
Source vs binary isn't where the distinction applies, though. For example, it's legitimate for PyObjC to have pinned dependencies even when distributed in source form, as it's a metapackage used solely to integrate the various PyObjC subprojects into a single "release".
?? So that means that some packages have a loosely specified source that compiles down to a more strictly specified binary, and some have a more strictly specified source that compiles down to an equally strictly specified binary. That's... an argument in favor of my way of thinking about it, isn't it? That it can naturally express both situations?
Why are you bringing source vs binary into this? That has *nothing* to do with the problem space, which is about the large grey area between "definitely doesn't work" (aka "we tested this combination and it failed") and "will almost certainly work" (aka "we tested this specific combination of dependencies and it passed").
When publishing a software *component* (such as a library or application), the most important information to communicate to users is the former (i.e. the combinations you know *don't* work), while for applications & services you typically want to communicate *both* (i.e. the combinations you know definitively *don't* work, *and* the specific combinations you tested).
While you do need to do at least one build to actually run the tests, once you have those results, the metadata is just as applicable to the original source artifact as it is to the specific built binary.
My point is that *for the cases where there's an important distinction between Pipfile and Pipfile.lock*, we already have a way to think about that distinction without introducing new concepts.
Most software components won't have a Pipfile or Pipfile.lock, as that's an application & service oriented way of framing the distinction.
However, as Daniel said in his reply, we *do* want people to be able to publish applications and services like sentry or supervisord to PyPI, and we also want to allow people to publish metapackages like PyObjC.
The problem I'm trying to address is that we *don't* currently give publishers a machine readable way to say definitively "This is a pre-integrated application, service or metapackage" rather than "This is a component intended for integration into a larger application, service or metapackage".
I'm not a huge fan of having simple boolean toggles in metadata definitions (hence the more elaborate idea of two different kinds of dependency declaration), but this may be a case where that's a good way to go, since it would mean that services and tools that care can check it (with a recommendation in the spec saying that public index servers SHOULD check it), while those that don't care would continue to have a single unified set of dependency declarations to work with.
Cheers, Nick.
On 23 February 2017 at 08:18, Nick Coghlan ncoghlan@gmail.com wrote:
I'm not a huge fan of having simple boolean toggles in metadata definitions (hence the more elaborate idea of two different kinds of dependency declaration), but this may be a case where that's a good way to go, since it would mean that services and tools that care can check it (with a recommendation in the spec saying that public index servers SHOULD check it), while those that don't care would continue to have a single unified set of dependency declarations to work with.
While boolean metadata may not be ideal in the general case, I think it makes sense here. If you want to make it more acceptable, maybe make it Package-Type, with values "application" or "library".
On a related but tangential point, can I make a plea for using simpler language when documenting this (and even when discussing it)? The term "pre-integrated application" means very little to me in any practical sense beyond "application", and it brings a whole load of negative connotations - I deal with Java build processes on occasion, and the whole terminology there ("artifacts", "deployment units", ...) makes for a pretty hostile experience for the newcomer. I'd like to avoid Python packaging going down that route - even if the cost is a little vagueness in terms.
Paul
On 23 February 2017 at 18:37, Paul Moore p.f.moore@gmail.com wrote:
On 23 February 2017 at 08:18, Nick Coghlan ncoghlan@gmail.com wrote:
I'm not a huge fan of having simple boolean toggles in metadata
definitions
(hence the more elaborate idea of two different kinds of dependency declaration), but this may be a case where that's a good way to go,
since it
would mean that services and tools that care can check it (with a recommendation in the spec saying that public index servers SHOULD check it), while those that don't care would continue to have a single unified
set
of dependency declarations to work with.
While boolean metadata may not be ideal in the general case, I think it makes sense here. If you want to make it more acceptable, maybe make it Package-Type, with values "application" or "library".
That gets us back into the world of defining what the various package types mean, and I really don't want to go there :)
Instead, I'm thinking in terms of a purely capability based field: "allow_pinned_dependencies", with the default being "False", but actually checking the field also only being a SHOULD for public index servers and a MAY for everything else.
That would be enough for downstream tooling to pick up and say "I should treat this as a multi-component module rather than as an individual standalone component", *without* having to inflict the task of understanding the complexities of multi-tier distribution systems onto all component publishers :)
Cheers, Nick.
On 23 February 2017 at 08:44, Nick Coghlan ncoghlan@gmail.com wrote:
That gets us back into the world of defining what the various package types mean, and I really don't want to go there :)
And yet I still don't understand what's wrong with "application", "library", and "metapackage" (the latter saying to me "complex thing that I don't need to understand"). Those terms are clear enough - after all, they are precisely the ones we've always used when debating "should you pin or not"?
Sure, there's a level of judgement involved - but it's precisely the *same* judgement as we're asking authors to make when asking"should I pin", just using the underlying distinction directly.
Instead, I'm thinking in terms of a purely capability based field: "allow_pinned_dependencies", with the default being "False", but actually checking the field also only being a SHOULD for public index servers and a MAY for everything else.
How would the user see this? As a magic flag they have to set to "yes" so that they can pin dependencies? Because if that's the situation, I'd imagine a lot of authors just cargo-culting "add this flag to get my package to upload" without actually thinking about the implications. (They'll search Stack Overflow for the error message, so putting what it's for in the docs won't help...)
Paul
On 23 February 2017 at 18:53, Paul Moore p.f.moore@gmail.com wrote:
On 23 February 2017 at 08:44, Nick Coghlan ncoghlan@gmail.com wrote:
That gets us back into the world of defining what the various package
types
mean, and I really don't want to go there :)
And yet I still don't understand what's wrong with "application", "library", and "metapackage" (the latter saying to me "complex thing that I don't need to understand"). Those terms are clear enough - after all, they are precisely the ones we've always used when debating "should you pin or not"?
Sure, there's a level of judgement involved - but it's precisely the *same* judgement as we're asking authors to make when asking"should I pin", just using the underlying distinction directly.
Thinking about it further, I may be OK with that, especially since we can point to concrete examples.
component: a library or framework used to build Python applications. Users will mainly interact with the component via a Python API. Examples: requests, numpy, pytz application: an installable client application or web service. Users will mainly interact with the service via either the command line, a GUI, or a network interface. Examples: ckan (network), ansible (cli), spyder (GUI) metapackage: a package that collects specific versions of other components into a single installable group Example: PyObjC
And then we'd note in the spec that public index servers SHOULD warn when components use pinned dependencies, while other tools MAY warn about that case.
Going down that path would also end up addressing this old RFE for the packaging user guide: https://github.com/pypa/python-packaging-user-guide/issues/100
Instead, I'm thinking in terms of a purely capability based field: "allow_pinned_dependencies", with the default being "False", but actually checking the field also only being a SHOULD for public index servers and
a
MAY for everything else.
How would the user see this? As a magic flag they have to set to "yes" so that they can pin dependencies? Because if that's the situation, I'd imagine a lot of authors just cargo-culting "add this flag to get my package to upload" without actually thinking about the implications. (They'll search Stack Overflow for the error message, so putting what it's for in the docs won't help...)
Pre-answering questions on SO can work incredibly well, though :)
Cheers, Nick.
On 23 February 2017 at 12:32, Nick Coghlan ncoghlan@gmail.com wrote:
component: a library or framework used to build Python applications. Users will mainly interact with the component via a Python API. Examples: requests, numpy, pytz
Sorry to nitpick, but why is "component" better than "library"? People typically understand that "library" includes "framework" in this context. OTOH someone who's written a new library won't necessarily know that in this context (and *only* this context) we want them to describe it as a "component". (As far as I know, we don't use the term "component" anywhere else in the Python ecosystem currently). This feels to me somewhat like the failed attempts to force a distinction between "package" and "distribution". In the end, people use the terms they are comfortable with, and work with a certain level of context-dependent ambiguity.
Of course, if the goal here is to raise the barrier for entry to PyPI by expecting people to have to understand this type of concept and the implications before uploading, then that's fair. It's not something I think we should be aiming for personally, but I can see that organisations who want to be able to rely on the quality of what's available on PyPI would be in favour of a certain level of self-selection being applied. Personally I view PyPI as more of a public resource, like github, where it's up to the consumer to assess quality - so to me this is a change of focus. But YMMV.
Paul
On 23 February 2017 at 22:32, Nick Coghlan ncoghlan@gmail.com wrote:
On 23 February 2017 at 18:53, Paul Moore p.f.moore@gmail.com wrote:
On 23 February 2017 at 08:44, Nick Coghlan ncoghlan@gmail.com wrote:
That gets us back into the world of defining what the various package
types
mean, and I really don't want to go there :)
And yet I still don't understand what's wrong with "application", "library", and "metapackage" (the latter saying to me "complex thing that I don't need to understand"). Those terms are clear enough - after all, they are precisely the ones we've always used when debating "should you pin or not"?
Sure, there's a level of judgement involved - but it's precisely the *same* judgement as we're asking authors to make when asking"should I pin", just using the underlying distinction directly.
Thinking about it further, I may be OK with that, especially since we can point to concrete examples.
component: a library or framework used to build Python applications. Users will mainly interact with the component via a Python API. Examples: requests, numpy, pytz
Slight amendment here to use the term "library" rather than the generic component (freeing up the latter for its usual meaning in referring to arbitrary software components). I also realised that we need a separate category to cover packages like "pip" itself, and I chose "tool" based on the name of the field in pyproject.toml:
============ library: a software component used to build Python applications. Users will mainly interact with the component via a Python API. Libraries are essentially dynamic plugins for a Python runtime. Examples: requests, numpy, pytz tool: a software utility used to develop and deploy Python libraries, applications, and scripts. Users will mainly interact with the component via the command line, or a GUI. Examples: pip, pycodestyle, gunicorn, jupyter application: an installable client application or web service. Users will mainly interact with the service via either the command line, a GUI, or a network interface. While they may expose Python APIs to end users, the fact they're written in Python themselves is technically an implementation detail, making it possible to use them without even being aware that Python exists. Examples: ckan (network), ansible (cli), spyder (GUI) metapackage: a package that collects specific versions of other components into a single installable group. Example: PyObjC ============
I think a package_type field with those possible values would cover everything I was worried about when I came up with the idea of the separate "integrates" field, and it seems like it would be relatively straightforward to explain to newcomers.
Cheers, Nick.
On 02/23/2017 02:47 PM, Nick Coghlan wrote:
============ library: a software component used to build Python applications. Users will mainly interact with the component via a Python API. Libraries are essentially dynamic plugins for a Python runtime. Examples: requests, numpy, pytz
Assuming frameworks are included, it woud be useful to add e.g. "django" to the examples.
tool: a software utility used to develop and deploy Python
libraries, applications, and scripts. Users will mainly interact with the component via the command line, or a GUI. Examples: pip, pycodestyle, gunicorn, jupyter application: an installable client application or web service. Users will mainly interact with the service via either the command line, a GUI, or a network interface. While they may expose Python APIs to end users, the fact they're written in Python themselves is technically an implementation detail, making it possible to use them without even being aware that Python exists. Examples: ckan (network), ansible (cli), spyder (GUI) metapackage: a package that collects specific versions of other components into a single installable group. Example: PyObjC ============
On 23 February 2017 at 14:24, Petr Viktorin encukou@gmail.com wrote:
On 02/23/2017 02:47 PM, Nick Coghlan wrote:
============ library: a software component used to build Python applications. Users will mainly interact with the component via a Python API. Libraries are essentially dynamic plugins for a Python runtime. Examples: requests, numpy, pytz
Assuming frameworks are included, it woud be useful to add e.g. "django" to the examples.
+1
On 23 February 2017 at 13:47, Nick Coghlan ncoghlan@gmail.com wrote:
Slight amendment here to use the term "library" rather than the generic component (freeing up the latter for its usual meaning in referring to arbitrary software components). I also realised that we need a separate category to cover packages like "pip" itself, and I chose "tool" based on the name of the field in pyproject.toml:
============ library: a software component used to build Python applications. Users will mainly interact with the component via a Python API. Libraries are essentially dynamic plugins for a Python runtime. Examples: requests, numpy, pytz tool: a software utility used to develop and deploy Python libraries, applications, and scripts. Users will mainly interact with the component via the command line, or a GUI. Examples: pip, pycodestyle, gunicorn, jupyter application: an installable client application or web service. Users will mainly interact with the service via either the command line, a GUI, or a network interface. While they may expose Python APIs to end users, the fact they're written in Python themselves is technically an implementation detail, making it possible to use them without even being aware that Python exists. Examples: ckan (network), ansible (cli), spyder (GUI) metapackage: a package that collects specific versions of other components into a single installable group. Example: PyObjC ============
I think a package_type field with those possible values would cover everything I was worried about when I came up with the idea of the separate "integrates" field, and it seems like it would be relatively straightforward to explain to newcomers.
Yeah, that looks good. I'd assume that:
(1) The field is optional. (2) The field is 99% for information only, with the only imposed semantics being that PyPI can reject use of == constraints in install_requires unless the type is explicitly "application" or "metapackage".
Specifically, I doubt people will make a firm distinction between "tool" and "library". In many cases it'll be a matter of opinion. Is py.test a tool or a library? It has a command line interface after all. I'd also drop "used to develop and deploy Python libraries, applications, and scripts" - why does what it's used for affect its category? I can think of examples I think of as "tools" that are general purpose (e.g. youtube-dl) but I'd expect you to claim they are "applications". But unless they pin their dependencies (which youtube-dl doesn't AFAIK) the distinction is irrelevant. So I prefer to leave it to the author to decide, rather than force an artificial split.
Thanks for taking the time to address my concerns!
Paul
On Thu, Feb 23, 2017, at 02:28 PM, Paul Moore wrote:
I'd also drop "used to develop and deploy Python libraries, applications, and scripts" - why does what it's used for affect its category?
Things for working on & with Python code often have installation requirements a bit different from other applications. E.g. pip installs (or used to) with aliases specific to the Python version it runs on, so pip, pip3 and pip-3.5 could all point to the same command. Clearly it wouldn't make sense to do that for youtube-dl.
I'm not sure about 'tool' as a name for this category, but they often do require different handling to general applications.
Thomas
On 23 February 2017 at 14:49, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Thu, Feb 23, 2017, at 02:28 PM, Paul Moore wrote:
I'd also drop "used to develop and deploy Python libraries, applications, and scripts" - why does what it's used for affect its category?
Things for working on & with Python code often have installation requirements a bit different from other applications. E.g. pip installs (or used to) with aliases specific to the Python version it runs on, so pip, pip3 and pip-3.5 could all point to the same command. Clearly it wouldn't make sense to do that for youtube-dl.
I'm not sure about 'tool' as a name for this category, but they often do require different handling to general applications.
Point taken, but in the absence of a behavioural difference, why not let the author decide?
If I wrote "grep in Python", I'd call it a tool, not an application. The author of pyline (https://pypi.python.org/pypi/pyline) describes it as a "tool". For me, command line utilities are typically called tools. Applications tend to have (G)UIs. I don't think we should repurpose existing terms. And unless we're planning on enforcing different behaviour, I don't think we need to try to dictate at all.
If we were to add a facility to create versioned names (rather than just having a special-case hack for pip) then I could imagine restricting it to certain package types - although I can't imagine why we would bother doing so - but let's not worry about that until it happens.
Or maybe we'd want to insist that pip only allow build tools to have a certain package type (setuptools, flit, ...) but again, why bother? What's the gain?
Paul
On 24 February 2017 at 00:28, Paul Moore p.f.moore@gmail.com wrote:
Specifically, I doubt people will make a firm distinction between "tool" and "library". In many cases it'll be a matter of opinion. Is py.test a tool or a library? It has a command line interface after all. I'd also drop "used to develop and deploy Python libraries, applications, and scripts" - why does what it's used for affect its category? I can think of examples I think of as "tools" that are general purpose (e.g. youtube-dl) but I'd expect you to claim they are "applications". But unless they pin their dependencies (which youtube-dl doesn't AFAIK) the distinction is irrelevant. So I prefer to leave it to the author to decide, rather than force an artificial split.
The difference is that:
* tool = you typically want at least one copy per Python interpreter (like a library) * application = you typically only want one copy per system
It may be clearer to make the former category "devtool", since it really is specific to tools that are coupled to the task of Python development.
Cheers, Nick.
On 23 February 2017 at 15:09, Nick Coghlan ncoghlan@gmail.com wrote:
The difference is that:
- tool = you typically want at least one copy per Python interpreter (like a
library)
- application = you typically only want one copy per system
It may be clearer to make the former category "devtool", since it really is specific to tools that are coupled to the task of Python development.
Ah, OK. That's a good distinction, but I'd avoid linking it to "used for developing Python code". I wouldn't call pyline something used for developing Python code, although you'd want to install it to the (possibly multiple) Python versions you want to use in your one-liners. OTOH, I'd agree you want copies of Jupyter per interpreter, although I'd call Jupyter an application, not a development tool. There's a lot of people who would view Jupyter as an application with a built in Python interpreter rather than the other way around. And do you want to say that Jupyter cannot pin dependencies because it's a "tool" rather than an "application"?
Maybe we should keep the package type neutral on this question, and add a separate field to denote one per system vs one per interpreter? But again, without proposed behaviour tied to the value, I'm inclined not to care. (And not to add metadata that no-one will bother using).
Paul
On 24 February 2017 at 01:27, Paul Moore p.f.moore@gmail.com wrote:
On 23 February 2017 at 15:09, Nick Coghlan ncoghlan@gmail.com wrote:
The difference is that:
- tool = you typically want at least one copy per Python interpreter
(like a
library)
- application = you typically only want one copy per system
It may be clearer to make the former category "devtool", since it really
is
specific to tools that are coupled to the task of Python development.
Ah, OK. That's a good distinction, but I'd avoid linking it to "used for developing Python code". I wouldn't call pyline something used for developing Python code, although you'd want to install it to the (possibly multiple) Python versions you want to use in your one-liners. OTOH, I'd agree you want copies of Jupyter per interpreter, although I'd call Jupyter an application, not a development tool. There's a lot of people who would view Jupyter as an application with a built in Python interpreter rather than the other way around. And do you want to say that Jupyter cannot pin dependencies because it's a "tool" rather than an "application"?
It provides a frame for a discussion between publishers and redistributors on how publishers would like their software to be treated.
Marking it as an application is saying "Treat it as a standalone application, and don't try to integrate it with anything else"
Marking it as a library is saying "Treat it as a Python component that expects to be integrated into a larger application"
Marking it as a metapackage is saying "Treat this particular set of libraries as a coherent whole, and don't try to mix-and-match other versions"
Marking it as a devtool is saying "This doesn't export a stable Python API (except maybe to plugins), but you should treat it as a library anyway"
Redistributors may *ask* a publisher to reclassify their project as a library or a devtool (and hence also avoid pinning their dependencies in order to make integration easier), but publishers will always have the option of saying "No, we want to you to treat it as an application, and we won't help your end users if we know you're overriding our pinned dependencies and the issue can't be reproduced outside your custom configuration".
Cheers, Nick.
On Feb 23, 2017, at 11:04 AM, Nick Coghlan ncoghlan@gmail.com wrote:
Redistributors may *ask* a publisher to reclassify their project as a library or a devtool (and hence also avoid pinning their dependencies in order to make integration easier), but publishers will always have the option of saying "No, we want to you to treat it as an application, and we won't help your end users if we know you're overriding our pinned dependencies and the issue can't be reproduced outside your custom configuration".
This whole discussion feels like trying to overcomplicate something that’s already not a simple to solve a problem that I don’t think is really that widespread. My estimation is that 99% of people who are currently using ``==`` will just immediately switch over to using whatever flag we provide that allows them to still do that. Adding a “do the thing I asked for” detritus to the project seems like a bad idea.
It’s not really any different than if a project say, only released Wheels. While we want to encourage projects to release sdists (and to not ping versions) trying to enforce that isn’t worth the cost. Like most packaging issues, I think that it’s best solved by opening up issues on the offending project’s issue tracker.
— Donald Stufft
Another way to look at the problem is that it is just too hard to override what the package says. For example in buildout you can provide a patch for any package that does not do exactly what you want, and it is applied during installation. This could include patching the dependencies.
On Thu, Feb 23, 2017 at 12:15 PM Donald Stufft donald@stufft.io wrote:
On Feb 23, 2017, at 11:04 AM, Nick Coghlan ncoghlan@gmail.com wrote:
Redistributors may *ask* a publisher to reclassify their project as a library or a devtool (and hence also avoid pinning their dependencies in order to make integration easier), but publishers will always have the option of saying "No, we want to you to treat it as an application, and we won't help your end users if we know you're overriding our pinned dependencies and the issue can't be reproduced outside your custom configuration".
This whole discussion feels like trying to overcomplicate something that’s already not a simple to solve a problem that I don’t think is really that widespread. My estimation is that 99% of people who are currently using ``==`` will just immediately switch over to using whatever flag we provide that allows them to still do that. Adding a “do the thing I asked for” detritus to the project seems like a bad idea.
It’s not really any different than if a project say, only released Wheels. While we want to encourage projects to release sdists (and to not ping versions) trying to enforce that isn’t worth the cost. Like most packaging issues, I think that it’s best solved by opening up issues on the offending project’s issue tracker.
—
Donald Stufft _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 23Feb2017 0914, Donald Stufft wrote:
On Feb 23, 2017, at 11:04 AM, Nick Coghlan <ncoghlan@gmail.com mailto:ncoghlan@gmail.com> wrote:
Redistributors may *ask* a publisher to reclassify their project as a library or a devtool (and hence also avoid pinning their dependencies in order to make integration easier), but publishers will always have the option of saying "No, we want to you to treat it as an application, and we won't help your end users if we know you're overriding our pinned dependencies and the issue can't be reproduced outside your custom configuration".
This whole discussion feels like trying to overcomplicate something that’s already not a simple to solve a problem that I don’t think is really that widespread. My estimation is that 99% of people who are currently using ``==`` will just immediately switch over to using whatever flag we provide that allows them to still do that. Adding a “do the thing I asked for” detritus to the project seems like a bad idea.
It’s not really any different than if a project say, only released Wheels. While we want to encourage projects to release sdists (and to not ping versions) trying to enforce that isn’t worth the cost. Like most packaging issues, I think that it’s best solved by opening up issues on the offending project’s issue tracker.
+1. This has been my feeling the entire time I spent catching up on the thread just now.
As soon as "user education" becomes a requirement, we may as well do the simplest and least restrictive metadata possible and use the education to help people understand the impact of their decisions.
Cheers, Steve
+1 also. This whole double requirement feels over-complicated for what seems like a rather small usecase: it would be interesting to have a few stats on the number of packages concerned by this pinning (maybe just scan all the last uploaded wheels of each package ?).
And if one needs to classify packages type, why not add a new high level trove classifier ?
Le 23 févr. 2017 19:19, "Steve Dower" steve.dower@python.org a écrit :
On 23Feb2017 0914, Donald Stufft wrote:
On Feb 23, 2017, at 11:04 AM, Nick Coghlan <ncoghlan@gmail.com
mailto:ncoghlan@gmail.com> wrote:
Redistributors may *ask* a publisher to reclassify their project as a library or a devtool (and hence also avoid pinning their dependencies in order to make integration easier), but publishers will always have the option of saying "No, we want to you to treat it as an application, and we won't help your end users if we know you're overriding our pinned dependencies and the issue can't be reproduced outside your custom configuration".
This whole discussion feels like trying to overcomplicate something that’s already not a simple to solve a problem that I don’t think is really that widespread. My estimation is that 99% of people who are currently using ``==`` will just immediately switch over to using whatever flag we provide that allows them to still do that. Adding a “do the thing I asked for” detritus to the project seems like a bad idea.
It’s not really any different than if a project say, only released Wheels. While we want to encourage projects to release sdists (and to not ping versions) trying to enforce that isn’t worth the cost. Like most packaging issues, I think that it’s best solved by opening up issues on the offending project’s issue tracker.
+1. This has been my feeling the entire time I spent catching up on the thread just now.
As soon as "user education" becomes a requirement, we may as well do the simplest and least restrictive metadata possible and use the education to help people understand the impact of their decisions.
Cheers, Steve
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Thu, Feb 23, 2017 at 1:56 PM, Xavier Fernandez xav.fernandez@gmail.com wrote:
+1 also. This whole double requirement feels over-complicated for what seems like a rather small usecase: it would be interesting to have a few stats on the number of packages concerned by this pinning (maybe just scan all the last uploaded wheels of each package ?).
FWIW, an application packaging tool a wrote several years ago used to hit into dependency solver problems quite often. The tool leaned on distlib (which is a pretty nice library, but strict, as noted in OP) because there is/was no interface to pip. IIRC we upstreamed a few patches related to this and for sure carried some local patches.
The distlib solver would bind up from impossible constraints, yet every time, pip found a way to "power through" the exact same configuration despite blatantly incompatible metadata at times. I never looked into it further on pip's side (though probably someone here can confirm/deny this) but I suspect poor metadata is more widespread than pip makes visible.
I had a dump from 2014 of the distlib data at red-dove.com and I ran a quick script against it:
https://gist.github.com/anthonyrisinger/f9140191009fb1ec1434cb0585a4a75c
total_projects: 41228 total_projects_eq: 182 % affected: 0.44% total_files: 285248 total_files_eq: 1276 % affected: 0.45% total_reqs: 642447 total_reqs_bare: 460080 % affected: 71.61%
I know the distlib data (from 2014 at least) is imperfect, but this would suggest not many projects use "==" explicitly. Maybe the bigger problem is that 75% of requirements have no version specifier at all. I know for us this specifically contributed to our solver problems because distlib was eager about choosing a version for such requirements, even though a later package might fulfill the requirement. Maybe this has since changed, but we needed to patch it at the time [1].
We really have to figure out this distribution stuff friends. Existing files, new metadata files, old PEPs, new PEPs... it's looking a bit "broken windows theory" for the principal method used to share Python with the world, and the outsized lens though which Python is perceived. Maybe this means hard or opinionated decisions, but I really can't stress enough how much of a drag it is to an otherwise reasonably solid Python experience. There is a real perception that it's more trouble than it's worth, especially with many other good options at the table.
[1] https://github.com/anthonyrisinger/zippy/commit/1c5d34d89805c47188a18cfbe17c...
On Feb 23, 2017, at 4:51 PM, C Anthony Risinger anthony@xtfx.me wrote:
The distlib solver would bind up from impossible constraints, yet every time, pip found a way to "power through" the exact same configuration despite blatantly incompatible metadata at times. I never looked into it further on pip's side (though probably someone here can confirm/deny this) but I suspect poor metadata is more widespread than pip makes visible.
<1% of projects or files using == suggests to me that there is very few people using == incorrectly.
— Donald Stufft
On Thu, Feb 23, 2017 at 4:21 PM, Donald Stufft donald@stufft.io wrote:
On Feb 23, 2017, at 4:51 PM, C Anthony Risinger anthony@xtfx.me wrote:
The distlib solver would bind up from impossible constraints, yet every time, pip found a way to "power through" the exact same configuration despite blatantly incompatible metadata at times. I never looked into it further on pip's side (though probably someone here can confirm/deny this) but I suspect poor metadata is more widespread than pip makes visible.
<1% of projects or files using == suggests to me that there is very few people using == incorrectly.
Yeah I'm pretty sure the bigger problem was version-less reqs eagerly selecting a version (eg. latest) incompatible with later requirements provided by a different package, but then treating them as hard reqs by that point. I'll defer to you on how pip deals with things today.
I'll try to resurface a concrete example. I know for certain pip at that time (circa 2015) was capable of installing a set of packages where the dependency information was not solvent, because I pointed it out to my team (I actually think python-dateutil was involved for that one, mentioned in another post).
I would agree though, "==" is way way less widespread than no version at all.
On Feb 23, 2017, at 5:31 PM, C Anthony Risinger anthony@xtfx.me wrote:
Yeah I'm pretty sure the bigger problem was version-less reqs eagerly selecting a version (eg. latest) incompatible with later requirements provided by a different package, but then treating them as hard reqs by that point. I'll defer to you on how pip deals with things today.
I'll try to resurface a concrete example. I know for certain pip at that time (circa 2015) was capable of installing a set of packages where the dependency information was not solvent, because I pointed it out to my team (I actually think python-dateutil was involved for that one, mentioned in another post).
Yea, pip doesn’t really have a dep solver. It’s mechanism for selecting which version to install is… not smart.
— Donald Stufft
On Thursday, February 23, 2017, Donald Stufft donald@stufft.io wrote:
On Feb 23, 2017, at 5:31 PM, C Anthony Risinger <anthony@xtfx.me javascript:_e(%7B%7D,'cvml','anthony@xtfx.me');> wrote:
Yeah I'm pretty sure the bigger problem was version-less reqs eagerly selecting a version (eg. latest) incompatible with later requirements provided by a different package, but then treating them as hard reqs by that point. I'll defer to you on how pip deals with things today.
I'll try to resurface a concrete example. I know for certain pip at that time (circa 2015) was capable of installing a set of packages where the dependency information was not solvent, because I pointed it out to my team (I actually think python-dateutil was involved for that one, mentioned in another post).
Yea, pip doesn’t really have a dep solver. It’s mechanism for selecting which version to install is… not smart.
"Pip needs a dependency resolver" https://github.com/pypa/pip/issues/988
- {Enthought, Conda,}: SAT solver (there are many solutions) - easy_install: - pip:
— Donald Stufft
On 24 February 2017 at 08:21, Donald Stufft donald@stufft.io wrote:
On Feb 23, 2017, at 4:51 PM, C Anthony Risinger anthony@xtfx.me wrote:
The distlib solver would bind up from impossible constraints, yet every time, pip found a way to "power through" the exact same configuration despite blatantly incompatible metadata at times. I never looked into it further on pip's side (though probably someone here can confirm/deny this) but I suspect poor metadata is more widespread than pip makes visible.
<1% of projects or files using == suggests to me that there is very few people using == incorrectly.
And if it does become a more notable problem in the future then a metadata independent way of dealing with it would be to add a warning to twine suggesting replacing "==" with "~=" (as well as an off switch to say "Don't bug me about that").
So I think the upshot of all this is that the entire semantic dependency structure in PEP 426 should be simplified to:
1. A single "dependencies" list that allows any PEP 508 dependency specifier 2. A MAY clause permitting tools to warn about the use of `==` and `===` 3. A MAY clause permitting tools to prohibit the use of direct references 4. A conventional set of "extras" with pre-defined semantics ("build", "dev", "doc", "test")
That gives us an approach that's entirely compatible with the current 1.x metadata formats (so existing tools will still work), while also moving us closer to a point where PEP 426 could actually be accepted.
Cheers, Nick.
On Thursday, February 23, 2017, Xavier Fernandez xav.fernandez@gmail.com wrote:
+1 also. This whole double requirement feels over-complicated for what seems like a rather small usecase: it would be interesting to have a few stats on the number of packages concerned by this pinning (maybe just scan all the last uploaded wheels of each package ?).
And if one needs to classify packages type, why not add a new high level trove classifier ?
+1 This could be accomplished with a trove classifier (because Entity Attribute boolean-Value)
The component/library, application, metapackage categorical would require far more docs than:
pip install --ignore-versions metapkgname
Which is effectively, probably, maybe the same? as:
pip install metapkg pip install --upgrade __ALL__
... say, given that metapkgname requires (install_requires) ipython, and the requirements.txt is:
metapkgname # ipython==4.2 ipython
If pip freeze returns:
ipython metapkgname
And I then:
pip freeze -- | xargs pip install --upgrade
Haven't I then upgraded ipython past the metapackage pinned version, anyway?
http://stackoverflow.com/questions/2720014/upgrading-all-packages-with-pip
The best workaround that I'm aware of: - Create integration test and then build scripts - Run test/build script in a container - Change dependencies, Commit, Create a PR, (e.g. Travis CI runs the test/build/report/post script), Review integration test output
What integration tests do the RPM/DNF package maintainers run for, say, django, simplejson, [and psycopg2, for django-ca]? If others have already integration-tested even a partially overlapping set, that's great and it would be great to be able to store, share, and search those build artifacts (logs, pass/fail).
Additionally, practically, could we add metadata pointing to zero or more OS packages, per-distribution? How do I know that there's probably a somewhat-delayed repackaging named "python-ipython" which *might* work with the rest of the bleeding edge trunk builds I consider as stable as yesterday, given which tests?
Le 23 févr. 2017 19:19, "Steve Dower" <steve.dower@python.org javascript:_e(%7B%7D,'cvml','steve.dower@python.org');> a écrit :
On 23Feb2017 0914, Donald Stufft wrote:
On Feb 23, 2017, at 11:04 AM, Nick Coghlan <ncoghlan@gmail.com
javascript:_e(%7B%7D,'cvml','ncoghlan@gmail.com'); <mailto:ncoghlan@gmail.com javascript:_e(%7B%7D,'cvml','ncoghlan@gmail.com');>> wrote:
Redistributors may *ask* a publisher to reclassify their project as a library or a devtool (and hence also avoid pinning their dependencies in order to make integration easier), but publishers will always have the option of saying "No, we want to you to treat it as an application, and we won't help your end users if we know you're overriding our pinned dependencies and the issue can't be reproduced outside your custom configuration".
This whole discussion feels like trying to overcomplicate something that’s already not a simple to solve a problem that I don’t think is really that widespread. My estimation is that 99% of people who are currently using ``==`` will just immediately switch over to using whatever flag we provide that allows them to still do that. Adding a “do the thing I asked for” detritus to the project seems like a bad idea.
It’s not really any different than if a project say, only released Wheels. While we want to encourage projects to release sdists (and to not ping versions) trying to enforce that isn’t worth the cost. Like most packaging issues, I think that it’s best solved by opening up issues on the offending project’s issue tracker.
+1. This has been my feeling the entire time I spent catching up on the thread just now.
As soon as "user education" becomes a requirement, we may as well do the simplest and least restrictive metadata possible and use the education to help people understand the impact of their decisions.
Cheers, Steve
Distutils-SIG maillist - Distutils-SIG@python.org javascript:_e(%7B%7D,'cvml','Distutils-SIG@python.org'); https://mail.python.org/mailman/listinfo/distutils-sig
On Thu, Feb 23, 2017 at 12:44 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 23 February 2017 at 18:37, Paul Moore p.f.moore@gmail.com wrote:
On 23 February 2017 at 08:18, Nick Coghlan ncoghlan@gmail.com wrote:
I'm not a huge fan of having simple boolean toggles in metadata definitions (hence the more elaborate idea of two different kinds of dependency declaration), but this may be a case where that's a good way to go, since it would mean that services and tools that care can check it (with a recommendation in the spec saying that public index servers SHOULD check it), while those that don't care would continue to have a single unified set of dependency declarations to work with.
While boolean metadata may not be ideal in the general case, I think it makes sense here. If you want to make it more acceptable, maybe make it Package-Type, with values "application" or "library".
That gets us back into the world of defining what the various package types mean, and I really don't want to go there :)
Instead, I'm thinking in terms of a purely capability based field: "allow_pinned_dependencies", with the default being "False", but actually checking the field also only being a SHOULD for public index servers and a MAY for everything else.
That would be enough for downstream tooling to pick up and say "I should treat this as a multi-component module rather than as an individual standalone component", *without* having to inflict the task of understanding the complexities of multi-tier distribution systems onto all component publishers :)
I'm still not sure I understand what you're trying to do, but this feels like you're trying to have it both ways... if you don't want to define what the different package types mean, and it's purely a capability-based field, then surely that means that downstream tooling *can't* make assumptions about what kind of package type it is based on the field? ISTM that from the point of view of downstream tooling, "allow_pinned_dependencies" carries literally no information, because all it means is "this package is on a public server and its Requires-Dist field has an == in it", which are things we already know. I can see how this would help your goal of educating uploaders about good package hygiene, but not how it helps downstream distributors.
(Here's an example I've just run into that involves a == dependency on a public package: I have a library that needs to access some C API calls on Windows, but not on other platforms. The natural way to do this is to split out the CFFI code into its own package, _mylib_windows_helper or whatever, that has zero public interface, and have mylib v1.2.3 require "_mylib_windows_helper==1.2.3; os_name == 'nt'". That way I can distribute one pure-Python wheel + one binary wheel and everything just works. But there's no sense in which this is an "integrated application" or anything, it's just a single library that usually ships in one .whl but sometimes ships in 2 .whls.)
((In actual fact I'm currently not building the package this way because setuptools makes it extremely painful to actually maintain that setup. Really I need the ability to build two wheels out of a single source package. Since we don't have that, I'm instead using CFFI's slow and semi-deprecated ABI mode, which lets me call C functions from a pure Python package. But what I described above is really the "right" solution, it's just tooling limitations that make it painful.))
-n
Here's an example I've just run into that involves a == dependency on
a public package: I have a library that needs to access some C API calls on Windows, but not on other platforms. The natural way to do this is to split out the CFFI code into its own package, _mylib_windows_helper or whatever, that has zero public interface, and have mylib v1.2.3 require "_mylib_windows_helper==1.2.3; os_name == 'nt'".
You have a public library, that, depending on the platform, depends on a public (helper) library that has no public interface? That doesn't sound good to me. If you don't want to implement a public interface then it should just be included in the main library because it is in the end a requirement of the library. It's a pity you can't have a universal wheel but so be it. Choosing to depend on an exact version of a package that has no public interfance is in my opinion the wrong solution.
As I stated before, though perhaps not explicitly, I cannot think of *any* good reason that one uses == in `install_requires`. Something like `>= 1.7, < 1.8` should be sufficient. In the CFFI case that should be sufficient unless you change your function signatures in a maintenance release (which is bad). And in case of a metapackage like PyObjC this should also be sufficient because it will downgrade dependencies when downgrading the metapackage while still giving you the latest maintenance releases of the dependencies.
Regarding 'application', 'library', and 'metapackage'. In Nixpkgs we distinguish Python libraries and applications. Applications are available for 1 version of the interpreter, whereas libraries are available for all (supported) interpreter versions. It's nice if it were more explicit on say PyPI whether a package is a library or an application. There are difficult cases though, e.g., `ipython`. Is that an application or a library? As user I would argue that it is an application, however, it should be available for each version of the interpreter and that's why we branded it a library.
Metapackages. `jupyter` is a metapackage. We had to put it with the rest of the Python libraries for the same reason as we put `ipython` there.
From a distributions'
point of view I don't see why you would want to have them mentioned separately.
On Thu, Feb 23, 2017 at 12:49 PM, Nathaniel Smith njs@pobox.com wrote:
On Thu, Feb 23, 2017 at 12:44 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 23 February 2017 at 18:37, Paul Moore p.f.moore@gmail.com wrote:
On 23 February 2017 at 08:18, Nick Coghlan ncoghlan@gmail.com wrote:
I'm not a huge fan of having simple boolean toggles in metadata definitions (hence the more elaborate idea of two different kinds of dependency declaration), but this may be a case where that's a good way to go, since it would mean that services and tools that care can check it (with a recommendation in the spec saying that public index servers SHOULD
check
it), while those that don't care would continue to have a single
unified
set of dependency declarations to work with.
While boolean metadata may not be ideal in the general case, I think it makes sense here. If you want to make it more acceptable, maybe make it Package-Type, with values "application" or "library".
That gets us back into the world of defining what the various package
types
mean, and I really don't want to go there :)
Instead, I'm thinking in terms of a purely capability based field: "allow_pinned_dependencies", with the default being "False", but actually checking the field also only being a SHOULD for public index servers and
a
MAY for everything else.
That would be enough for downstream tooling to pick up and say "I should treat this as a multi-component module rather than as an individual standalone component", *without* having to inflict the task of
understanding
the complexities of multi-tier distribution systems onto all component publishers :)
I'm still not sure I understand what you're trying to do, but this feels like you're trying to have it both ways... if you don't want to define what the different package types mean, and it's purely a capability-based field, then surely that means that downstream tooling *can't* make assumptions about what kind of package type it is based on the field? ISTM that from the point of view of downstream tooling, "allow_pinned_dependencies" carries literally no information, because all it means is "this package is on a public server and its Requires-Dist field has an == in it", which are things we already know. I can see how this would help your goal of educating uploaders about good package hygiene, but not how it helps downstream distributors.
(Here's an example I've just run into that involves a == dependency on a public package: I have a library that needs to access some C API calls on Windows, but not on other platforms. The natural way to do this is to split out the CFFI code into its own package, _mylib_windows_helper or whatever, that has zero public interface, and have mylib v1.2.3 require "_mylib_windows_helper==1.2.3; os_name == 'nt'". That way I can distribute one pure-Python wheel + one binary wheel and everything just works. But there's no sense in which this is an "integrated application" or anything, it's just a single library that usually ships in one .whl but sometimes ships in 2 .whls.)
((In actual fact I'm currently not building the package this way because setuptools makes it extremely painful to actually maintain that setup. Really I need the ability to build two wheels out of a single source package. Since we don't have that, I'm instead using CFFI's slow and semi-deprecated ABI mode, which lets me call C functions from a pure Python package. But what I described above is really the "right" solution, it's just tooling limitations that make it painful.))
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 23 February 2017 at 23:04, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
Here's an example I've just run into that involves a == dependency on
a public package: I have a library that needs to access some C API calls on Windows, but not on other platforms. The natural way to do this is to split out the CFFI code into its own package, _mylib_windows_helper or whatever, that has zero public interface, and have mylib v1.2.3 require "_mylib_windows_helper==1.2.3; os_name == 'nt'".
You have a public library, that, depending on the platform, depends on a public (helper) library that has no public interface? That doesn't sound good to me. If you don't want to implement a public interface then it should just be included in the main library because it is in the end a requirement of the library. It's a pity you can't have a universal wheel but so be it. Choosing to depend on an exact version of a package that has no public interfance is in my opinion the wrong solution.
As I stated before, though perhaps not explicitly, I cannot think of *any* good reason that one uses == in `install_requires`. Something like `>= 1.7, < 1.8` should be sufficient. In the CFFI case that should be sufficient unless you change your function signatures in a maintenance release (which is bad). And in case of a metapackage like PyObjC this should also be sufficient because it will downgrade dependencies when downgrading the metapackage while still giving you the latest maintenance releases of the dependencies.
Regarding 'application', 'library', and 'metapackage'. In Nixpkgs we distinguish Python libraries and applications. Applications are available for 1 version of the interpreter, whereas libraries are available for all (supported) interpreter versions. It's nice if it were more explicit on say PyPI whether a package is a library or an application. There are difficult cases though, e.g., `ipython`. Is that an application or a library? As user I would argue that it is an application, however, it should be available for each version of the interpreter and that's why we branded it a library.
That sounds pretty similar to the distinction in Fedora as well, which has been highlighted by the Python 3 migration effort: libraries emit both Python 2 & 3 RPMs from their source RPM (and will for as long as Fedora and the library both support Python 2), while applications just switch from depending on Python 2 to depending on Python 3 instead.
Metapackages. `jupyter` is a metapackage. We had to put it with the rest of the Python libraries for the same reason as we put `ipython` there. From a distributions' point of view I don't see why you would want to have them mentioned separately.
From a distro point of view, explicit upstream metapackages would provide a
hint saying "these projects should be upgraded as a unit rather than independently". We're free to ignore that hint if we want to, but doing so means we get to keep the pieces if they break rather than just being able to report the problem back upstream :)
Cheers, Nick.
On 23 February 2017 at 13:04, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
Here's an example I've just run into that involves a == dependency on a public package: I have a library that needs to access some C API calls on Windows, but not on other platforms. The natural way to do this is to split out the CFFI code into its own package, _mylib_windows_helper or whatever, that has zero public interface, and have mylib v1.2.3 require "_mylib_windows_helper==1.2.3; os_name == 'nt '".
You have a public library, that, depending on the platform, depends on a public (helper) library that has no public interface? That doesn't sound good to me. If you don't want to implement a public interface then it should just be included in the main library because it is in the end a requirement of the library. It's a pity you can't have a universal wheel but so be it. Choosing to depend on an exact version of a package that has no public interfance is in my opinion the wrong solution.
The helper library is only public in the sense that it's published on PyPI. I'd describe it as an optional helper. If PyPI had a way of marking such libraries as "only allow downloading to satisfy a dependency" then I'd say mark it that way - but we don't.
Requiring non-universal (and consequently version-dependent) wheels for platforms that don't need them seems like a cure that's worse than the disease.
Personally, I find Nathaniel's example to be a compelling reason for wanting to specify exact dependencies for something that's not an "application". As an author, it's how I'd prefer to bundle a package like this. And IMO, if distributions prefer that I don't do that, I'd say it's up to them to explain what they want me to do, and how it'll benefit me and my direct users. At the moment all I'm seeing is "you should" and "it's the wrong solution" - you may be right, but surely it's obvious that you need to explain*why* your view is correct? Or at a minimum, if there is no direct benefit to me, why I, as an author, should modify my preferred development model to make things easier for you.
Not all packages published on PyPI need or want to be bundled into OS distributions[1]. Paul
[1] OTOH, the bulk of this discussion is currently about theoretical cases anyway. Maybe it would be worth everyone (myself included) taking a deep breath, and refocusing on actual cases where there is a problem right now (I don't know if anyone can identify such cases - I know I can't). Asking directly of the authors of such packages "would you be OK with the following proposal" would likely be very enlightening.
On Feb 23, 2017, at 6:49 AM, Nathaniel Smith njs@pobox.com wrote:
(Here's an example I've just run into that involves a == dependency on a public package: I have a library that needs to access some C API calls on Windows, but not on other platforms. The natural way to do this is to split out the CFFI code into its own package, _mylib_windows_helper or whatever, that has zero public interface, and have mylib v1.2.3 require "_mylib_windows_helper==1.2.3; os_name == 'nt'". That way I can distribute one pure-Python wheel + one binary wheel and everything just works. But there's no sense in which this is an "integrated application" or anything, it's just a single library that usually ships in one .whl but sometimes ships in 2 .whls.)
((In actual fact I'm currently not building the package this way because setuptools makes it extremely painful to actually maintain that setup. Really I need the ability to build two wheels out of a single source package. Since we don't have that, I'm instead using CFFI's slow and semi-deprecated ABI mode, which lets me call C functions from a pure Python package. But what I described above is really the "right" solution, it's just tooling limitations that make it painful.))
Another way of handling this is to just publish a universal wheel and a Windows binary wheel. Pip will select the more specific one (the binary one) over the universal wheel when it is available.
— Donald Stufft
On Feb 23, 2017 7:46 AM, "Donald Stufft" donald@stufft.io wrote:
On Feb 23, 2017, at 6:49 AM, Nathaniel Smith njs@pobox.com wrote:
(Here's an example I've just run into that involves a == dependency on a public package: I have a library that needs to access some C API calls on Windows, but not on other platforms. The natural way to do this is to split out the CFFI code into its own package, _mylib_windows_helper or whatever, that has zero public interface, and have mylib v1.2.3 require "_mylib_windows_helper==1.2.3; os_name == 'nt'". That way I can distribute one pure-Python wheel + one binary wheel and everything just works. But there's no sense in which this is an "integrated application" or anything, it's just a single library that usually ships in one .whl but sometimes ships in 2 .whls.)
((In actual fact I'm currently not building the package this way because setuptools makes it extremely painful to actually maintain that setup. Really I need the ability to build two wheels out of a single source package. Since we don't have that, I'm instead using CFFI's slow and semi-deprecated ABI mode, which lets me call C functions from a pure Python package. But what I described above is really the "right" solution, it's just tooling limitations that make it painful.))
Another way of handling this is to just publish a universal wheel and a Windows binary wheel. Pip will select the more specific one (the binary one) over the universal wheel when it is available.
Thanks, I was wondering about that :-).
Still, I don't really like this solution in this case, because if someone did install the universal wheel on Windows it would be totally broken, yet there'd be no metadata to warn them. (This is a case where the binary isn't just an acceleration module, but is providing crucial functionality.) Even if pip wouldn't do this automatically, it's easy to imagine cases where it would happen.
-n
Maybe it would help if you have a concrete example of a
scenario where they would benefit from having this distinction?
In the Nix package manager (source distribution with binary substitutes) and Nixpkgs package set we typically require the filename and hash of a package. In our expressions we typically pass an url (that includes the name), and the hash. The url is only needed when the file isn't in our store. This is convenient, because if an url is optional this allows you to pre-fetch or work with mirrors. All we care about is that we get the file, not how it is provided. This applies for source archives, but behind the scenes also for binary substitutes. With Nix, functions build a package, and dependencies are passed as function arguments with names that typically, but not necessarily, resemble the dependency name.
Now, a function that builds a package, a package builder, only needs to be provided with abstract dependencies; it just needs to know what it should look for, "we need 'a' numpy, 'a' scipy, 'a compiler that has a certain interface and can do this job'", etc.. Version numbers can help in order to fail prematurely, but generally only bounds, not a pinned value. Its up to another tool to provide the builder with the actual packages, the concrete dependencies to the builder. And this tool might fetch it from PyPI, or from GitHub, or...
The same goes for building, distributing and installing Python packages. Setuptools shouldn't bother with versions (except the constraints in case of libraries) or wherever a source comes from but just build or fail. Pip should just fetch/resolve and pass concrete dependencies to whatever builder (Setuptools, Flit), or whatever environment (virtualenv) needs it.
It's quite frustrating as a downstream having to deal with packages where versions are pinned unnecessarily and therefore I've also requested on the Setuptools tracker a flag that ignores constraints [1] (though I fear I would have to pull up my sleeves for this one :) ) .
[1] https://github.com/pypa/setuptools/issues/894
On Wed, Feb 15, 2017 at 3:11 PM, Nathaniel Smith njs@pobox.com wrote:
On Wed, Feb 15, 2017 at 5:27 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 15 February 2017 at 12:58, Nathaniel Smith njs@pobox.com wrote:
On Wed, Feb 15, 2017 at 3:33 AM, Nick Coghlan ncoghlan@gmail.com
wrote:
- "requires": list where entries are either a string containing a PEP
508 dependency specifier or else a hash map contain a "requires" key plus "extra" or "environment" fields as qualifiers
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
What's accomplished by separating these? I really think we should strive to have fewer more orthogonal concepts whenever possible...
It's mainly a matter of incorporating https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core data model, as this distinction between abstract development dependencies and concrete deployment dependencies is incredibly important for any scenario that involves publisher-redistributor-consumer chains, but is entirely non-obvious to folks that are only familiar with the publisher-consumer case that comes up during development-for-personal-and-open-source-use.
Maybe I'm just being dense but, umm. I don't know what any of these words mean :-). I'm not unfamiliar with redistributors; part of my confusion is that this is a concept that AFAIK distro package systems don't have. Maybe it would help if you have a concrete example of a scenario where they would benefit from having this distinction?
One particular area where this is problematic is in the widespread advice "always pin your dependencies" which is usually presented without the all important "for application or service deployment" qualifier. As a first approximation: pinning-for-app-or-service-deployment == good, pinning-for-local-testing == good, pinning-for-library-or-framework-publication-to-PyPI == bad.
pipenv borrows the Ruby solution to modeling this by having Pipfile for abstract dependency declarations and Pipfile.lock for concrete integration testing ones, so the idea here is to propagate that model to pydist.json by separating the "requires" field with abstract development dependencies from the "integrates" field with concrete deployment dependencies.
What's the benefit of putting this in pydist.json? I feel like for the usual deployment cases (a) going straight from Pipfile.lock -> venv is pretty much sufficient, with no need to put this into a package, but (b) if you really do want to put it into a package, then the natural approach would be to make an empty wheel like "my-django-app-deploy.whl" whose dependencies were the contents of Pipfile.lock.
There's certainly a distinction to be made between the abstract dependencies and the exact locked dependencies, but to me the natural way to model that distinction is by re-using the distinction we already have been source packages and binary packages. The build process for this placeholder wheel is to "compile down" the abstract dependencies into concrete dependencies, and the resulting wheel encodes the result of this compilation. Again, no new concepts needed.
In the vast majority of publication-to-PyPi cases people won't need the "integrates" field, since what they're publishing on PyPI will just be their abstract dependencies, and any warning against using "==" will recommend using "~=" or ">=" instead. But there *are* legitimate uses of pinning-for-publication (like the PyObjC metapackage bundling all its subcomponents, or when building for private deployment infastructure), so there needs to be a way to represent "Yes, I'm pinning this dependency for publication, and I'm aware of the significance of doing so"
Why can't PyObjC just use regular dependencies? That's what distro metapackages have done for decades, right?
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 15 February 2017 at 15:50, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
It's quite frustrating as a downstream having to deal with packages where versions are pinned unnecessarily and therefore I've also requested on the Setuptools tracker a flag that ignores constraints [1] (though I fear I would have to pull up my sleeves for this one :) ) .
Sort of repeating my earlier question, but how often does this happen in reality? (As a proportion of the packages you deal with). And how often is it that a simple request/PR to the package author to remove the explicit version requirements is rejected? (I assume your first response is to file an issue with upstream?)
If you *do* get in a situation where the package explicitly requires certain versions of its dependencies, and you ignore those requirements, then presumably you're taking responsibility for supporting a combination that the upstream author doesn't support. How do you handle that?
I'm not trying to pick on your process here, or claim that distributions are somehow doing things wrongly. But I am trying to understand what redistributors' expectations are of package authors. Nick said he wants to guide authors away from explicit version pinning. That's fine, but is the problem so big that the occasional bug report to offending projects saying "please don't pin exact versions" is insufficient guidance?
Paul
Sort of repeating my earlier question, but how often does this happen
in reality?
From a quick check in our repo we have patched about 1% of our packages to
remove the constraints. We have close to 2000 Python packages. We don't necessarily patch all the constraints, only when they collide with the version we would like the package to use so the actual percentage is likely higher.
Larger applications that have many dependencies that are fixed have been kept out of Nixpkgs for now. Their fixed dependencies means we likely need multiple versions of packages. While Nix can handle that, it means more maintenance. We have a tool that can take e.g. a requirements.txt file and generate expressions, but it won't help you much with bug-fix releases when maintainers don't update their pinned requirements.
And how often is it that a simple request/PR to the package author to
remove the explicit version requirements is rejected?
That's hard to say. If I look at what packages I've contributed to Nixpkgs, then in my experience this is something that is typically dealt with by upstream when asked.
If you *do* get in a situation where the package explicitly requires
certain versions of its dependencies, and you ignore those requirements, then presumably you're taking responsibility for supporting a combination that the upstream author doesn't support. How do you handle that?
Typical situations are bug-fix releases. So far I haven't encountered any issues with using other versions, but like I said, larger applications that pin their dependencies have been mostly kept out of Nixpkgs. If we do encounter issues, then we have to find a solution. The likeliest situation is that an application requires a different version, an in that case we would then have an expression/package of that version specifically for that application. We don't have a global site-packages so we can do that.
Nick said he wants to guide authors away from explicit version
pinning. That's fine, but is the problem so big that the occasional bug report to offending projects saying "please don't pin exact versions" is insufficient guidance?
The main problem I see is that it limits in how far you can automatically update to newer versions and thus release bug/security fixes. Just one inappropriate pin is sufficient to break dependency solving.
On Wed, Feb 15, 2017 at 5:14 PM, Paul Moore p.f.moore@gmail.com wrote:
On 15 February 2017 at 15:50, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
It's quite frustrating as a downstream having to deal with packages where versions are pinned unnecessarily and therefore I've also requested on
the
Setuptools tracker a flag that ignores constraints [1] (though I fear I would have to pull up my sleeves for this one :) ) .
Sort of repeating my earlier question, but how often does this happen in reality? (As a proportion of the packages you deal with). And how often is it that a simple request/PR to the package author to remove the explicit version requirements is rejected? (I assume your first response is to file an issue with upstream?)
If you *do* get in a situation where the package explicitly requires certain versions of its dependencies, and you ignore those requirements, then presumably you're taking responsibility for supporting a combination that the upstream author doesn't support. How do you handle that?
I'm not trying to pick on your process here, or claim that distributions are somehow doing things wrongly. But I am trying to understand what redistributors' expectations are of package authors. Nick said he wants to guide authors away from explicit version pinning. That's fine, but is the problem so big that the occasional bug report to offending projects saying "please don't pin exact versions" is insufficient guidance?
Paul
On Wed, Feb 15, 2017 at 11:55 AM, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
Sort of repeating my earlier question, but how often does this happen
in reality?
From a quick check in our repo we have patched about 1% of our packages to remove the constraints. We have close to 2000 Python packages. We don't necessarily patch all the constraints, only when they collide with the version we would like the package to use so the actual percentage is likely higher.
Larger applications that have many dependencies that are fixed have been kept out of Nixpkgs for now. Their fixed dependencies means we likely need multiple versions of packages. While Nix can handle that, it means more maintenance. We have a tool that can take e.g. a requirements.txt file and generate expressions, but it won't help you much with bug-fix releases when maintainers don't update their pinned requirements.
I suppose this isn't a problem for Java applications, which use jar files and per-application class paths.
Jim
Thanks for your reply, it was very helpful.
On 15 February 2017 at 16:55, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
Larger applications that have many dependencies that are fixed have been kept out of Nixpkgs for now.
I notice here (and in a few other places) you talk about "Applications". From what I understand of Nick's position, applications absolutely should pin their dependencies - so if I'm understanding correctly, those applications will (and should) continue to pin exact versions.
As regards automatic packaging of new upstream versions (of libraries rather than applications), I guess if you get upstream to remove the pinned versions, this problem goes away.
The main problem I see is that it limits in how far you can automatically update to newer versions and thus release bug/security fixes. Just one inappropriate pin is sufficient to break dependency solving.
I'm not sure I follow this. Suppose we have foo 1.0 depending on bar. If foo 1.0 has doesn't pin bar (possibly because you reported to them that they shouldn't) then foo 1.1 isn't going to suddenly add the pin back. So you can update foo fine. And you can update bar because there's no pin. So yes, while "one inappropriate pin" can cause a problem, getting upstream to fix that is a one-off cost, not an ongoing issue.
So, in summary,
* I agree that libraries pinning dependencies too tightly is bad. * Distributions can easily enough report such pins upstream when the library is initially packaged, so there's no ongoing cost here (just possibly a delay before the library can be packaged). * Libraries can legitimately have appropriate pins (typically to ranges of versions). So distributions have to be able to deal with that. * Applications *should* pin precise versions. Distributions have to decide whether to respect those pins or remove them and then take on support of the combination that upstream doesn't support. * But application pins should be in a requirements.txt file, so ignoring version specs is pretty simple (just a script to run against the requirements file). * Because Python doesn't support multiple installed versions of packages, conflicting requirements *will* be a problem that distros have to solve themselves (the language response is "use a venv").
Nick is suggesting that the requirement metadata be prohibited from using exact pins, but there's alternative metadata for "yes, I really mean an exact pin". To me:
1. This doesn't have any bearing on *application* pins, as they aren't in metadata. 2. Distributions still have to be able to deal with libraries having exact pins, as it's an explicitly supported possibility. 3. You can still manage (effectively) exact pins without being explicit - foo >1.6,<1.8 pretty much does it. And that doesn't even have to be a deliberate attempt to break the system, it could be a genuine attempt to avoid known issues, that just got too aggressive.
So we're left with additional complexity for library authors to understand, for what seems like no benefit in practice to distribution builders. The only stated benefit of the 2 types of metadata is to educate library authors of the benefits of not pinning versions - and it seems like a very sweeping measure, where bug reports from distributions seem like they would be a much more focused and just as effective approach.
Paul
I also get a little frustrated with this kind of proposal "no pins" which I read as "annoy the publisher to try to prevent them from annoying the consumer". As a free software publisher I feel entitled to annoy the consumer, an activity I will indulge in inversely proportional to my desire for users. Who is the star?
It should be possible to publish applications to pypi. Much of the packaging we have is completely web application focused, these applications are not usually published at all.
On Wed, Feb 15, 2017 at 12:58 PM Paul Moore p.f.moore@gmail.com wrote:
Thanks for your reply, it was very helpful.
On 15 February 2017 at 16:55, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
Larger applications that have many dependencies that are fixed have been kept out of Nixpkgs for now.
I notice here (and in a few other places) you talk about "Applications". From what I understand of Nick's position, applications absolutely should pin their dependencies - so if I'm understanding correctly, those applications will (and should) continue to pin exact versions.
As regards automatic packaging of new upstream versions (of libraries rather than applications), I guess if you get upstream to remove the pinned versions, this problem goes away.
The main problem I see is that it limits in how far you can
automatically update to newer versions and thus release bug/security fixes. Just one inappropriate pin is sufficient to break dependency solving.
I'm not sure I follow this. Suppose we have foo 1.0 depending on bar. If foo 1.0 has doesn't pin bar (possibly because you reported to them that they shouldn't) then foo 1.1 isn't going to suddenly add the pin back. So you can update foo fine. And you can update bar because there's no pin. So yes, while "one inappropriate pin" can cause a problem, getting upstream to fix that is a one-off cost, not an ongoing issue.
So, in summary,
- I agree that libraries pinning dependencies too tightly is bad.
- Distributions can easily enough report such pins upstream when the
library is initially packaged, so there's no ongoing cost here (just possibly a delay before the library can be packaged).
- Libraries can legitimately have appropriate pins (typically to
ranges of versions). So distributions have to be able to deal with that.
- Applications *should* pin precise versions. Distributions have to
decide whether to respect those pins or remove them and then take on support of the combination that upstream doesn't support.
- But application pins should be in a requirements.txt file, so
ignoring version specs is pretty simple (just a script to run against the requirements file).
- Because Python doesn't support multiple installed versions of
packages, conflicting requirements *will* be a problem that distros have to solve themselves (the language response is "use a venv").
Nick is suggesting that the requirement metadata be prohibited from using exact pins, but there's alternative metadata for "yes, I really mean an exact pin". To me:
- This doesn't have any bearing on *application* pins, as they aren't
in metadata. 2. Distributions still have to be able to deal with libraries having exact pins, as it's an explicitly supported possibility. 3. You can still manage (effectively) exact pins without being explicit - foo >1.6,<1.8 pretty much does it. And that doesn't even have to be a deliberate attempt to break the system, it could be a genuine attempt to avoid known issues, that just got too aggressive.
So we're left with additional complexity for library authors to understand, for what seems like no benefit in practice to distribution builders. The only stated benefit of the 2 types of metadata is to educate library authors of the benefits of not pinning versions - and it seems like a very sweeping measure, where bug reports from distributions seem like they would be a much more focused and just as effective approach.
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Feb 15, 2017, at 1:15 PM, Daniel Holth dholth@gmail.com wrote:
I also get a little frustrated with this kind of proposal "no pins" which I read as "annoy the publisher to try to prevent them from annoying the consumer". As a free software publisher I feel entitled to annoy the consumer, an activity I will indulge in inversely proportional to my desire for users. Who is the star?
It should be possible to publish applications to pypi. Much of the packaging we have is completely web application focused, these applications are not usually published at all.
I haven’t fully followed this thread, and while the recommendation is and will always be to use the least strict version specifier that will work for your application, I am pretty heavily -1 on mandating that people do not use ``==``. I am also fairly heavily -1 on confusing the data model even more by making two sets of dependencies, one that allows == and one that doesn’t. I don’t think that overly restrictive pins is that common of a problem (if anything, we’re more likely to have too loose of pins, due to the always-upgrade nature of pip and the difficulty of exhaustivly testing every possible version combination).
In cases where this actively harms the end user (effectively when there is a security issue or a conflict) we can tell the user about it (theoretically, not in practice yet) but beyond that, this is best handled by opening individual issues up on each individual repository, just like any other packaging issue with that project.
— Donald Stufft
On Feb 15, 2017, at 11:44 AM, Donald Stufft donald@stufft.io wrote:
On Feb 15, 2017, at 1:15 PM, Daniel Holth <dholth@gmail.com mailto:dholth@gmail.com> wrote:
I also get a little frustrated with this kind of proposal "no pins" which I read as "annoy the publisher to try to prevent them from annoying the consumer". As a free software publisher I feel entitled to annoy the consumer, an activity I will indulge in inversely proportional to my desire for users. Who is the star?
It should be possible to publish applications to pypi. Much of the packaging we have is completely web application focused, these applications are not usually published at all.
I haven’t fully followed this thread, and while the recommendation is and will always be to use the least strict version specifier that will work for your application, I am pretty heavily -1 on mandating that people do not use ``==``. I am also fairly heavily -1 on confusing the data model even more by making two sets of dependencies, one that allows == and one that doesn’t.
I hope I'm not repeating a suggestion that appears up-thread, but, if you want to distribute an application with pinned dependencies, you could always released 'foo-lib' with a lenient set of dependencies, and 'foo-app' which depends on 'foo-lib' but pins the transitive closure of all dependencies with '=='. Your CI system could automatically release a new 'foo-app' every time any dependency has a new release and a build against the last release of 'foo-app' passes.
-glyph
I notice here (and in a few other places) you talk about
"Applications". From what I understand of Nick's position, applications absolutely should pin their dependencies - so if I'm understanding correctly, those applications will (and should) continue to pin exact versions.
Application developers typically don't test against all combinations of dependency versions and it also doesn't really make sense for them. Therefore it is understandable from their point of view to pin their dependencies. However, should they pin to a certain major/minor version, or to a patch version? In my opinion they best pin to minor versions. That should be sufficient to guarantee the app works. Let the distributions take care of providing the latest patch version so that it remains safe. And that means indeed specifying >1.6,<1.8 (or actually >=1.7,<1.8), and not ==1.7 or ==1.7.3. The same goes for the meta-packages.
On Wed, Feb 15, 2017 at 6:57 PM, Paul Moore p.f.moore@gmail.com wrote:
Thanks for your reply, it was very helpful.
On 15 February 2017 at 16:55, Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
Larger applications that have many dependencies that are fixed have been kept out of Nixpkgs for now.
I notice here (and in a few other places) you talk about "Applications". From what I understand of Nick's position, applications absolutely should pin their dependencies - so if I'm understanding correctly, those applications will (and should) continue to pin exact versions.
As regards automatic packaging of new upstream versions (of libraries rather than applications), I guess if you get upstream to remove the pinned versions, this problem goes away.
The main problem I see is that it limits in how far you can
automatically update to newer versions and thus release bug/security fixes. Just one inappropriate pin is sufficient to break dependency solving.
I'm not sure I follow this. Suppose we have foo 1.0 depending on bar. If foo 1.0 has doesn't pin bar (possibly because you reported to them that they shouldn't) then foo 1.1 isn't going to suddenly add the pin back. So you can update foo fine. And you can update bar because there's no pin. So yes, while "one inappropriate pin" can cause a problem, getting upstream to fix that is a one-off cost, not an ongoing issue.
So, in summary,
- I agree that libraries pinning dependencies too tightly is bad.
- Distributions can easily enough report such pins upstream when the
library is initially packaged, so there's no ongoing cost here (just possibly a delay before the library can be packaged).
- Libraries can legitimately have appropriate pins (typically to
ranges of versions). So distributions have to be able to deal with that.
- Applications *should* pin precise versions. Distributions have to
decide whether to respect those pins or remove them and then take on support of the combination that upstream doesn't support.
- But application pins should be in a requirements.txt file, so
ignoring version specs is pretty simple (just a script to run against the requirements file).
- Because Python doesn't support multiple installed versions of
packages, conflicting requirements *will* be a problem that distros have to solve themselves (the language response is "use a venv").
Nick is suggesting that the requirement metadata be prohibited from using exact pins, but there's alternative metadata for "yes, I really mean an exact pin". To me:
- This doesn't have any bearing on *application* pins, as they aren't
in metadata. 2. Distributions still have to be able to deal with libraries having exact pins, as it's an explicitly supported possibility. 3. You can still manage (effectively) exact pins without being explicit - foo >1.6,<1.8 pretty much does it. And that doesn't even have to be a deliberate attempt to break the system, it could be a genuine attempt to avoid known issues, that just got too aggressive.
So we're left with additional complexity for library authors to understand, for what seems like no benefit in practice to distribution builders. The only stated benefit of the 2 types of metadata is to educate library authors of the benefits of not pinning versions - and it seems like a very sweeping measure, where bug reports from distributions seem like they would be a much more focused and just as effective approach.
Paul
On 15 Feb 2017 23:28, "Paul Moore" p.f.moore@gmail.com wrote:
So, in summary,
* I agree that libraries pinning dependencies too tightly is bad. * Distributions can easily enough report such pins upstream when the library is initially packaged, so there's no ongoing cost here (just possibly a delay before the library can be packaged).
No, we can't easily do this. libraries.io tracks more than *two million* open source projects. Debian is the largest Linux distribution, and only tracks 50k packages.
That means it is typically going to be *app* developers that run into the problem of inappropriately pinned dependencies and
So if we rely on a manual "publish with pinned dependencies", "get bug report from redistributor or app developer", "republish with unpinned dependencies", we'll be in a situation where:
- the affected app developer or redistributor is going to have a negative experience with the project - the responsible publisher is either going to have a negative interaction with an end user or redistributor, or else they'll just silently move on to find an alternative library - we relinquish any control of the tone used when the publisher is alerted to the problem
By contrast, if we design the metadata format such that *PyPI* can provide a suitable error message, then:
- publishers get alerted to the problem *prior* to publication - end users and redistributors are unlikely to encounter the problem directly - we retain full control over the tone of the error notification
* Libraries can legitimately have appropriate pins (typically to ranges of versions). So distributions have to be able to deal with that. * Applications *should* pin precise versions. Distributions have to decide whether to respect those pins or remove them and then take on support of the combination that upstream doesn't support. * But application pins should be in a requirements.txt file, so ignoring version specs is pretty simple (just a script to run against the requirements file).
Applications also get packaged as sdists and wheel files, so pydist.json does need to handle that case.
Nick is suggesting that the requirement metadata be prohibited from using exact pins, but there's alternative metadata for "yes, I really mean an exact pin". To me:
1. This doesn't have any bearing on *application* pins, as they aren't in metadata. 2. Distributions still have to be able to deal with libraries having exact pins, as it's an explicitly supported possibility. 3. You can still manage (effectively) exact pins without being explicit - foo >1.6,<1.8 pretty much does it. And that doesn't even have to be a deliberate attempt to break the system, it could be a genuine attempt to avoid known issues, that just got too aggressive.
People aren't going to do the last one accidentally, but they *will* use "==" when transferring app development practices to library development.
So we're left with additional complexity for library authors to understand, for what seems like no benefit in practice to distribution builders.
- We'll get more automated conversions with pyp2rpm and similar tools that "just work" without human intervention - We'll get fewer negative interpersonal interactions between upstream publishers and downstream redistributors
It won't magically make everything all sunshine and roses, but we're currently at a point where about 70% of pyp2rpm conversions fail for various reasons, so every little bit helps :)
The only stated benefit of the 2 types of metadata is to educate library authors of the benefits of not pinning versions - and it seems like a very sweeping measure, where bug reports from distributions seem like they would be a much more focused and just as effective approach.
We've been playing that whack-a-mole game for years, and it sucks enormously for both publishers and redistributors from a user experience perspective.
More importantly though, it's already failing to scale adequately, hence the rise of technologies like Docker, Flatpak, and Snappy that push more integration and update responsibilities back to application and service developers. The growth rates on PyPI mean we can expect those scalability challenges to get *worse* rather than better in the coming years.
By pushing this check down into the tooling infrastructure, the aim would be to make the automated systems take on the task of being the "bad guy", rather than asking humans to do it later.
Cheers, Nick.
On Fri, Feb 17, 2017 at 12:56 AM, Nick Coghlan ncoghlan@gmail.com wrote:
By contrast, if we design the metadata format such that *PyPI* can provide a suitable error message, then:
But all these benefits you're describing also work if you s/PyPI/setuptools/, no? And that doesn't require any metadata PEPs or global coordination, you could send them a PR this afternoon if you want.
-n
On 17 February 2017 at 08:56, Nick Coghlan ncoghlan@gmail.com wrote:
- we retain full control over the tone of the error notification
I tried to formulate a long response to this email, and got completely bogged down. So I'm going to give a brief[1] response for now and duck out until the dust settles.
By "we" above, I assume you mean distutils-sig/PyPA. As part of that group, I find the complexities of how distributions package stuff up, and the expected interactions of the multitude of parties involved in the model you describe, completely baffling. That's fine normally (as a Windows developer, I don't typically interact with Linux distributions) but when it comes to being part of distutils-sig/PyPA in terms of how we present things like this, I feel a responsibility to understand (and by proxy, represent users who are similarly unaware of distro practices, etc).
I understand (somewhat) the motivations behind this distinction between "requires" and "integrates"[2] but I think we need to come up with a much more straightforward explanation - geared towards library authors who don't understand (and probably aren't that interested in) the underling issues - before we standardise anything. Because otherwise, we'll be rehashing this debate over and over as library authors get errors they don't understand, and come asking.
Paul
[1] Yes, this was as brief as I could manage :-( [2] As a data point, I couldn't even think of the right terms to use here without scanning back over the email thread to look for them. That indicates to me that the concepts are anything but intuitive :-(
On 2017-02-17 09:56:04 +0100 (+0100), Nick Coghlan wrote: [...]
So if we rely on a manual "publish with pinned dependencies", "get bug report from redistributor or app developer", "republish with unpinned dependencies", we'll be in a situation where:
- the affected app developer or redistributor is going to have a negative
experience with the project
- the responsible publisher is either going to have a negative interaction
with an end user or redistributor, or else they'll just silently move on to find an alternative library
- we relinquish any control of the tone used when the publisher is alerted
to the problem
By contrast, if we design the metadata format such that *PyPI* can provide a suitable error message, then:
- publishers get alerted to the problem *prior* to publication
- end users and redistributors are unlikely to encounter the problem
directly
- we retain full control over the tone of the error notification
[...]
It seems like the same could be said of many common mistakes which can be identified with some degree of certainty through analysis of the contents being uploaded. Why not also scan for likely security vulnerabilities with a static analyzer and refuse offending uploads unless the uploader toggles the magic "yes I really mean it" switch? Surely security issues are even greater downstream risks than simple dependency problems. (NB: I'm not in favor of that either, just nudging an example in the reductio ad absurdum direction.)
On 17 February 2017 at 23:18, Jeremy Stanley fungi@yuggoth.org wrote:
On 2017-02-17 09:56:04 +0100 (+0100), Nick Coghlan wrote: [...]
So if we rely on a manual "publish with pinned dependencies", "get bug report from redistributor or app developer", "republish with unpinned dependencies", we'll be in a situation where:
- the affected app developer or redistributor is going to have a negative
experience with the project
- the responsible publisher is either going to have a negative
interaction
with an end user or redistributor, or else they'll just silently move on
to
find an alternative library
- we relinquish any control of the tone used when the publisher is
alerted
to the problem
By contrast, if we design the metadata format such that *PyPI* can
provide
a suitable error message, then:
- publishers get alerted to the problem *prior* to publication
- end users and redistributors are unlikely to encounter the problem
directly
- we retain full control over the tone of the error notification
[...]
It seems like the same could be said of many common mistakes which can be identified with some degree of certainty through analysis of the contents being uploaded. Why not also scan for likely security vulnerabilities with a static analyzer and refuse offending uploads unless the uploader toggles the magic "yes I really mean it" switch? Surely security issues are even greater downstream risks than simple dependency problems. (NB: I'm not in favor of that either, just nudging an example in the reductio ad absurdum direction.)
Most of the other potential checks are about forming an opinion about software quality, rather than attempting to discern publisher intent.
Now, we could ask all package developers "Is this an application, service, or metapackage?", but then we'd have to get into a detailed discussion of what those terms mean, and help them decide whether or not any of them apply to what they're doing. It would also be a complete waste of their time if they're not attempting to pin any dependencies in the first place, or if they're not publishing the component to a public index server.
Alternatively, we can defer asking any question at all until they do something where the difference matters: attempting to pin a dependency to a specific version when publishing to a public index server. At that point, there is an ambiguity in intent as there are multiple reasons somebody could be doing that:
- they're actually publishing an application, service, or metapackage, so dependency pinning is entirely reasonable - they've carried over habits learned in application and service development into component and framework publishing - they've carried over habits learned in other ecosystems that encourage runtime version mixing (e.g. npm/js) into their Python publishing
So the discussion in this thread has convinced me that a separate "allow_pinned_dependencies" flag is a much better way to model this than attempting to define different dependency types, but I still want to include it in the metadata model :)
Cheers, Nick.
On 15 Feb 2017, at 15:11, Nathaniel Smith njs@pobox.com wrote:
In the vast majority of publication-to-PyPi cases people won't need the "integrates" field, since what they're publishing on PyPI will just be their abstract dependencies, and any warning against using "==" will recommend using "~=" or ">=" instead. But there *are* legitimate uses of pinning-for-publication (like the PyObjC metapackage bundling all its subcomponents, or when building for private deployment infastructure), so there needs to be a way to represent "Yes, I'm pinning this dependency for publication, and I'm aware of the significance of doing so"
Why can't PyObjC just use regular dependencies? That's what distro metapackages have done for decades, right?
PyObjC is conceptually a single project that is split in multiple PyPI distributions to make it easier to install only the parts you need (and can install, PyObjC wraps macOS frameworks including some that may not be available on the OS version that you’re running).
The project is managed as a single entity and updates will always release new versions of all PyPI packages for the project.
“pip install pyobjc==3.1” should install that version, and should not result in a mix of versions if you use this to downgrade (which could happen if the metapackage used “>=“ to specify the version of the concrete packages).
BTW. I’m not sure if my choice to split PyObjC in a large collection of PyPI packages is still the right choice with current state of the packaging landscape.
Ronald (the PyObjC maintainer)
-n
-- Nathaniel J. Smith -- https://vorpus.org https://vorpus.org/ _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org mailto:Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig https://mail.python.org/mailman/listinfo/distutils-sig
Wheel puts everything important in METADATA, except entry_points.txt. The requirements expressed there under 'Requires-Dist' are reliable, and the full METADATA format is documented in the pre-JSON revision of PEP 426. At runtime, once pkg_resources parses it, *.egg-info and *.dist-info look identical, because it's just a different way to represent the same data. Wheel's version of METADATA exists as the simplest way to add the critical 'extras' feature to distutils2-era *.dist-info/METADATA, necessary to losslessly represent setuptools packages in a more PEP-standard way. I could have completely redesigned the METADATA format instead of extending it, but then I would have run out of time and wheel would not exist.
This function converts egg-info metadata to METADATA https://bitbucket.org/pypa/wheel/src/54ddbcc9cec25e1f4d111a142b8bfaa163130a6...
This one converts to the JSON format. It looks like it might work with PKG-INFO or METADATA. https://bitbucket.org/pypa/wheel/src/54ddbcc9cec25e1f4d111a142b8bfaa163130a6...
On Wed, Feb 15, 2017 at 8:27 AM Nick Coghlan ncoghlan@gmail.com wrote:
On 15 February 2017 at 12:58, Nathaniel Smith njs@pobox.com wrote:
On Wed, Feb 15, 2017 at 3:33 AM, Nick Coghlan ncoghlan@gmail.com
wrote:
- "requires": list where entries are either a string containing a PEP
508 dependency specifier or else a hash map contain a "requires" key plus "extra" or "environment" fields as qualifiers
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
What's accomplished by separating these? I really think we should strive to have fewer more orthogonal concepts whenever possible...
It's mainly a matter of incorporating https://caremad.io/posts/2013/07/setup-vs-requirement/ into the core data model, as this distinction between abstract development dependencies and concrete deployment dependencies is incredibly important for any scenario that involves publisher-redistributor-consumer chains, but is entirely non-obvious to folks that are only familiar with the publisher-consumer case that comes up during development-for-personal-and-open-source-use.
One particular area where this is problematic is in the widespread advice "always pin your dependencies" which is usually presented without the all important "for application or service deployment" qualifier. As a first approximation: pinning-for-app-or-service-deployment == good, pinning-for-local-testing == good, pinning-for-library-or-framework-publication-to-PyPI == bad.
pipenv borrows the Ruby solution to modeling this by having Pipfile for abstract dependency declarations and Pipfile.lock for concrete integration testing ones, so the idea here is to propagate that model to pydist.json by separating the "requires" field with abstract development dependencies from the "integrates" field with concrete deployment dependencies.
In the vast majority of publication-to-PyPi cases people won't need the "integrates" field, since what they're publishing on PyPI will just be their abstract dependencies, and any warning against using "==" will recommend using "~=" or ">=" instead. But there *are* legitimate uses of pinning-for-publication (like the PyObjC metapackage bundling all its subcomponents, or when building for private deployment infastructure), so there needs to be a way to represent "Yes, I'm pinning this dependency for publication, and I'm aware of the significance of doing so"
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
the full METADATA format is documented in the pre-JSON revision of PEP 426.
Can you confirm which exact revision in the PEPs repo you mean? I could guess at 0451397. That version does not refer to a field "Requires" (rather, the more recent "Requires-Dist"). Your conversion function reads the existing PKG-INFO, updates the Metadata-Version, and adds "Provides-Dist" and "Requires-Dist". It does not check whether the result conforms to that version of the PEP. For example, in the presence of "Requires" in PKG-INFO, you add "Requires-Dist", possibly leading to an ambiguity, because they sort of mean the same thing but could contain conflicting information (for example, different version constraints). The python-dateutils wheel which Jim referred to contained both "Requires" and "Requires-Dist" fields in its METADATA file, and, faced with a metadata set with both fields, the old packaging code used by distlib to handle the different metadata versions raised a "Unknown metadata set" error. In the face of ambiguity, it's refusing the temptation to guess :-)
If the conversion function adds "Requires-Dist" but doesn't remove "Requires", I'm not sure it conforms to that version of the PEP.
Regards,
Vinay Sajip
IIUC PEP 345, the predecessor of PEP 426, replaced Requires with Requires-Dist because the former was never very well specified, easier to re-name the field rather than redefine it. bdist_wheel's egg-info conversion assumes the only useful requirements are in the setuptools requires.txt. It would make sense to go ahead and delete the obsolete fields, I'm sure they were overlooked because they are not common in the wild.
From PEP 345:
- Deprecated fields: - Requires (in favor of Requires-Dist) - Provides (in favor of Provides-Dist) - Obsoletes (in favor of Obsoletes-Dist)
On Wed, Feb 15, 2017 at 10:31 AM Vinay Sajip vinay_sajip@yahoo.co.uk wrote:
the full METADATA format is documented in the pre-JSON revision of PEP
Can you confirm which exact revision in the PEPs repo you mean? I could guess at 0451397. That version does not refer to a field "Requires" (rather, the more recent "Requires-Dist"). Your conversion function reads the existing PKG-INFO, updates the Metadata-Version, and adds "Provides-Dist" and "Requires-Dist". It does not check whether the result conforms to that version of the PEP. For example, in the presence of "Requires" in PKG-INFO, you add "Requires-Dist", possibly leading to an ambiguity, because they sort of mean the same thing but could contain conflicting information (for example, different version constraints). The python-dateutils wheel which Jim referred to contained both "Requires" and "Requires-Dist" fields in its METADATA file, and, faced with a metadata set with both fields, the old packaging code used by distlib to handle the different metadata versions raised a "Unknown metadata set" error. In the face of ambiguity, it's refusing the temptation to guess :-)
If the conversion function adds "Requires-Dist" but doesn't remove "Requires", I'm not sure it conforms to that version of the PEP.
Regards,
Vinay Sajip
On Wed, Feb 15, 2017, at 03:40 PM, Daniel Holth wrote:
It would make sense to go ahead and delete the obsolete fields, I'm sure they were overlooked because they are not common in the wild.
From PEP 345:
- Deprecated fields:
- Requires (in favor of Requires-Dist)
- Provides (in favor of Provides-Dist)
For reference, packages made with flit do use 'Provides' to indicate the name of the importable module or package that the distribution installs. This seems to me to be something worth exposing - in another thread, we're discussing downloading and scanning packages to get this information. But I accept that it's not very useful while only a tiny minority of packages do it.
Thomas
On Wed, Feb 15, 2017 at 5:33 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 14 February 2017 at 21:21, Vinay Sajip via Distutils-SIG distutils-sig@python.org wrote:
I thought the current status was that it's called metadata.json exactly *because* it's not standardized, and you *shouldn't* look at it?
Well, it was work-in-progress-standardised according to PEP 426 (since sometimes implementations have to work in parallel with working out the details of specifications). Given that PEP 426 wasn't done and dusted but being progressed, I would have thought it perfectly acceptable to use "pydist.json", as the only things that would be affected would be packaging tools working to the PEP.
I asked Daniel to *stop* using pydist.json, since wheel was emitting a point-in-time snapshot of PEP 426 (which includes a lot of potentially-nice-to-have things that nobody has actually implemented so far, like the semantic dependency declarations and the enhancements to the extras syntax), rather than the final version of the spec.
Would you send a link to the source for this?
It's too bad that the JSON thing didn't work out, but I think we're better off working on better specifying the one source of truth everything already uses (METADATA) instead of bringing in *new* partially-incompatible-and-poorly-specified formats.
When you say "everything already uses", do you mean setuptools and wheel? If nobody else is allowed to play, that's one thing. But otherwise, there need to be standards for interoperability. The METADATA file, now -
exactly
which standard does it follow? The one in the dateutil wheel that Jim referred to doesn't appear to conform to any of the metadata PEPs. It was rejected by old metadata code in distlib (which came of out the Python
3.3
era "packaging" package - not to be confused with Donald's of the same
name -
which is strict in its interpretation of those earlier PEPs).
The METADATA format (key-value) is not really flexible enough for certain things which were in PEP 426 (e.g. dependency descriptions), and for
these
JSON seems a reasonable fit.
The current de facto standard set by setuptools and bdist_wheel is:
- dist-info/METADATA as defined at
https://packaging.python.org/specifications/#package-distribution-metadata
- dist-info/requires.txt runtime dependencies as defined at
http://setuptools.readthedocs.io/en/latest/formats.html#requires-txt
- dist-info/setup_requires.txt build time dependencies as defined at
http://setuptools.readthedocs.io/en/latest/formats.html#setup-requires-txt
The dependency fields in METADATA itself unfortunately aren't really useful for anything.
Graph: Nodes and edges.
There's definitely still a place for a pydist.json created by going through PEP 426, comparing it to what bdist_wheel already does to populate metadata.json, and either changing the PEP to match the existing practice, or else agreeing that we prefer what the PEP recommends, that we want to move in that direction, and that there's a definite commitment to implement the changes in at least setuptools and bdist_wheel (plus a migration strategy that allows for reasonably sensible consumption of old metadata).
Which function reads metadata.json? Which function reads pydist.json?
Such an update would necessarily be a fairly ruthless process, where we defer everything that can possibly be deferred. I already made one pass at that when I split out the metadata extensions into PEP 459, but at least one more such pass is needed before we can sign off on the spec as metadata 2.0 - even beyond any "open for discussion" questions, there are still things in there which were extracted and standardised separately in PEP 508.
There's no technical reason why "the JSON thing didn't work out", as far as I can see - it was just given up on for a
more
incremental approach (which has got no new PEPs other than 440, AFAICT).
Yep, it's a logistical problem rather than a technical problem per se
- new metadata formats need software publisher adoption to ensure the
design is sensible before we commit to them long term, but software publishers are understandably reluctant to rely on new formats that limit their target audience to folks running the latest versions of the installation tools (outside constrained cases where the software publisher is also the main consumer of that software).
An RDFS Vocabulary contains Classes and Properties with rdfs:ranges and rdfs:domains.
There are many representations for RDF: RDF/XML, Turtle/N3, JSONLD.
RDF is implementation-neutral. JSONLD is implementation-neutral.
For PEP 440 (version specifiers) and PEP 508 (dependency specifiers), this was handled by focusing on documenting practices that people already used (and checking existing PyPI projects for compatibility), rather than trying to actively change those practices.
For pyproject.toml (e.g. enscons), the idea is to provide a setup.py shim that can take care of bootstrapping the new approach for the benefit of older tools that assume the use of setup.py (similar to what was done with setup.cfg and d2to1).
The equivalent for PEP 426 would probably be legacy-to-pydist and pydist-to-legacy converters that setuptools, bdist_wheel and other publishing tools can use to ship legacy metadata alongside the standardised format (and I believe Daniel already has at least the former in order to generate metadata.json in bdist_wheel). With PEP 426 as currently written, a pydist-to-legacy converter isn't really feasible, since pydist proposes new concepts that can't be readily represented in the old format.
pydist-to-legacy would be a lossy transformation.
I understand that social reasons are often more important than technical
reasons
when it comes to success or failure of an approach; I'm just not sure
that
in this case, it wasn't given up on too early.
I think of PEP 426 as "deferred indefinitely pending specific practical problems to provide clearer design constraints" rather than abandoned :)
Is it too late to request lowercased property names without dashes? If we're (I'm?) going to create @context URIs, compare:
https://schema.python.org/v1#Provides-Extra
{ "@context": { "default": "https://schema.python.org/#", "schema": "http://schema.org/", # "name": "http://schema.org/name", # "url": "http://schema.org/url", # "verstr": # "extra": # "requirements" # "requirementstr" }, "@typeof": [ "py:PythonPackage"], "name": "IPython", "url": ["https://pypi.python.org/pypi/IPython", "https://pypi.org/project/ IPython"], "Provides-Extra": [ {"@typeof": "Requirement", "name": "notebook", "extra": ["notebook"], "requirements": [], #TODO "requirementstr": "extra == 'notebook'" }, {"name": "numpy", "extra": ["test"], "requirements": #TODO, "requirementstr": "python_version >= "3.4" and extra == 'test'" }, ... ] }
There are two recent developments that I think may provide those missing design constraints and hence motivation to finalise a metadata 2.0 specification:
- the wheel-to-egg support in humpty (and hence zc.buiidout). That
makes humpty a concrete non-traditional installer that would benefit from both a modernised standard metadata format, as well as common tools both to convert legacy metadata to the agreed modern format and to convert the modern format back to the legacy format for inclusion in the generated egg files (as then humpty could just re-use the shared tools, rather than having to maintain those capabilities itself).
class PackageMetadata def __init__(): self.data = collections.OrderedDict() @staticmethod def read_legacy() def read_metadata_json() def read_pydist_json() def read_pyproject_toml() def read_jsonld()
def to_legacy(): def to_metadata_json() def to_pydist_json() def to_pyproject_toml() def to_jsonld()
@classmethod def Legacy() def MetadataJson() def PydistJson() def PyprojectToml() def Jsonld(cls, *args, **kwargs) obj = cls(*args, **kwargs) obj.read_jsonld(*args, **kwargs) return obj
@classmethod def from(cls, path, format='legacy|metadatajson|pydistjson|pyprojecttoml|jsonld'): # or this
... for maximum reusability, we really shouldn't need an adapter registry here;
- the new pipenv project to provide a simpler alternative to the
pip+virtualenv+pip-tools combination for environment management in web service development (and similar layered application architectures). As with the "install vs setup" split in setuptools, pipenv settled on an "only two kinds of requirement (deployment and development)" model for usability reasons, but it also distinguishes abstract dependencies stored in Pipfile from pinned concrete dependencies stored in Pipfile.lock.
Does the Pipfile/Pipfile.lock distinction overlap with 'integrates' as a replacement for meta_requires?
If we put those together with the existing interest in automating generation of policy compliant operating system distribution packages,
Downstream OS packaging could easily (and without permission) include extra attributes (properties specified with full URIS) in JSONLD metadata.
it makes it easier to go through the proposed semantic dependency model in PEP 426 and ask "How would we populate these fields based on the metadata that projects *already* publish?".
See 'class PackageMetadata'
- "run requires": straightforward, as these are the standard
dependencies used in most projects. Not entirely clear how to gently (or strongly!) discourage dependency pinning when publishing to PyPI (although the Pipfile and Pipfile.lock model used in pipenv may help with this)
- "meta requires": not clear at all, as this was added to handle cases
like PyObjC, where the main package is just a metapackage that makes a particular set of versioned subpackages easy to install. This may be better modeled as a separate "integrates" field, using a declaration syntax more akin to that used for Pipfile.lock rather than that used for normal requirements declarations.
- "dev requires": corresponds to "dev-packages" in pipenv
- "build requires": corresponds to "setup_requires" in setuptools,
"build-system.requires" + any dynamic build dependencies in PEP 518
- "test requires": corresponds to "test" extra in
https://packaging.python.org/specifications/#provides-extra-multiple-use
The "doc" extra in https://packaging.python.org/specifications/#provides-extra-multiple-use would map to "build requires", but there's potential benefit to redistributors in separating it out, as we often split the docs out from the built software components (since there's little reason to install documentation on headless servers that are only going to be debugged remotely).
The main argument against "test requires" and "doc requires" is that the extras system already works fine for those - "pip install MyProject[test]" and "pip install MyProject[doc]" are both already supported, so metadata 2.0 just needs to continue to reserve those as semantically significant extras names.
"dev" requires could be handled the same way - anything you actually need to *build* an sdist or wheel archive from a source repository should be in "setup_requires" (setuptools) or "build-system.requires" (pyproject.toml), so "dev" would just be a conventional extra name rather than a top level field.
That just leaves "build_requires", which turns out to interact awkwardly with the "extras" system: if you write "pip install MyProject[test]", does it install all the "test" dependencies, regardless of whether they're listed in run_requires or build_requires?
If yes: then why are run_requires and build_requires separate? If no: then how do you request installation of the "test" build extra? Or are build extras prohibited entirely?
That suggests that perhaps "build" should just be a conventional extra as well, and considered orthogonal to the other conventional extras. (I'm sure this idea has been suggested before, but I don't recall who suggested it or when)
And if build, test, doc, and dev are all handled as extras, then the top level name "run_requires" no longer makes sense, and the field name should go back to just being "requires".
Under that evaluation, we'd be left with only the following top level fields defined for dependency declarations:
- "requires": list where entries are either a string containing a PEP
508 dependency specifier or else a hash map contain a "requires" key plus "extra" or "environment" fields as qualifiers
+1
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
Pipfile.lock?
What happens here when something is listed in both requires and integrates?
Where/do these get merged on the "name" attr as a key, given a presumed namespace URI prefix (https://pypi.org/project/)?
For converting old metadata, any concrete dependencies that are compatible with the "integrates" field format would be mapped that way, while everything else would be converted to "requires" entries.
What heuristic would help identify compatibility with the integrates field?
The semantic differences between normal runtime dependencies and "dev", "test", "doc" and "build" requirements would be handled as extras, regardless of whether you were using the old metadata format or the new one.
+1 from me.
I can't recall whether I've used {"dev", "test", "doc", and "build"} as extras names in the past; though I can remember thinking "wouldn't it be more intuitive to do it [that way]"
Is this backward compatible? Extras still work as extras?
Going the other direction would be similarly straightforward since (excluding extensions) the set of required conceptual entities has been reduced back to the set that already exists in the current metadata formats. While "requires" and "integrates" would be distinct fields in pydist.json, the decomposed fields in the latter would map back to their string-based counterparts in PEP 508 when converted to the legacy metadata formats.
Cheers, Nick.
P.S. I'm definitely open to a PR that amends the PEP 426 draft along these lines. I'll get to it eventually myself, but there are some other things I see as higher priority for my open source time at the moment (specifically the C locale handling behaviour of Python 3.6 in Fedora 26 and the related upstream proposal for Python 3.7 in PEP 538)
I need to find a job; my time commitment here is inconsistent. I'm working on a project (nbmeta) for generating, displaying, and embedding RDFa and JSONLD in Jupyter notebooks (w/ _repr_html_() and an OrderedDict) which should refresh the JSONLD @context-writing skills necessary to define the RDFS vocabulary we could/should have at https://schema.python.org/ .
- [ ] JSONLD PEP (<- PEP426) - [ ] examples / test cases - I've referenced IPython as an example package; are there other hard test cases for python packaging metadata conversion? (i.e. one that uses every feature of each metadata format)? - [ ] JSONLD @context - [ ] class PackageMetadata - [ ] wheel: (additionally) generate JSONLD metadata - [ ] schema.python.org: master, gh-pages (or e.g. " https://www.pypa.io/ns#")
- [ ] warehouse: add a ./jsonld view (to elgacy?)
https://github.com/pypa/interoperability-peps/issues/31
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 15 February 2017 at 14:00, Wes Turner wes.turner@gmail.com wrote:
On Wed, Feb 15, 2017 at 5:33 AM, Nick Coghlan ncoghlan@gmail.com wrote:
I asked Daniel to *stop* using pydist.json, since wheel was emitting a point-in-time snapshot of PEP 426 (which includes a lot of potentially-nice-to-have things that nobody has actually implemented so far, like the semantic dependency declarations and the enhancements to the extras syntax), rather than the final version of the spec.
Would you send a link to the source for this?
It came up when Vinay reported a problem with the way bdist_wheel was handling combined extras and environment marker definitions: https://bitbucket.org/pypa/wheel/issues/103/problem-with-currently-generated
- dist-info/METADATA as defined at
https://packaging.python.org/specifications/#package-distribution-metadata
- dist-info/requires.txt runtime dependencies as defined at
http://setuptools.readthedocs.io/en/latest/formats.html#requires-txt
- dist-info/setup_requires.txt build time dependencies as defined at
http://setuptools.readthedocs.io/en/latest/formats.html#setup-requires-txt
The dependency fields in METADATA itself unfortunately aren't really useful for anything.
Graph: Nodes and edges.
Unfortunately, it's not that simple, since:
- dependency declarations refer to time dependent node *sets*, not to specific edges - node resolution is not only time dependent, but also DNS and client configuration dependent - this is true even for "pinned" dependencies due to the way "==" handles post-releases and local build IDs - the legacy module based declarations are inconsistently populated and don't refer to nodes by a useful name - the new distribution package based declarations refer to nodes by a useful name, but largely aren't populated
By contrast, METADATA *does* usefully define nodes in the graph, while requires.txt and setup_requires.txt can be used to extract edges when combined with suitable additional data sources (primarily a nominated index server or set of index servers to use for dependency specifier resolution).
There's definitely still a place for a pydist.json created by going through PEP 426, comparing it to what bdist_wheel already does to populate metadata.json, and either changing the PEP to match the existing practice, or else agreeing that we prefer what the PEP recommends, that we want to move in that direction, and that there's a definite commitment to implement the changes in at least setuptools and bdist_wheel (plus a migration strategy that allows for reasonably sensible consumption of old metadata).
Which function reads metadata.json?
Likely eventually nothing, since anything important that it contains will be readable from either pydist.json or from the other legacy metadata files.
Which function reads pydist.json?
Eventually everything, with tools falling back to dynamically generating it from legacy metadata formats as a transition plan to handle component releases made with older toolchains.
An RDFS Vocabulary contains Classes and Properties with rdfs:ranges and rdfs:domains.
There are many representations for RDF: RDF/XML, Turtle/N3, JSONLD.
RDF is implementation-neutral. JSONLD is implementation-neutral.
While true, both of these are still oriented towards working with a *resolved* graph snapshot, rather than a deliberately underspecified graph description that requires subsequent resolution within the node set of a particular index server (or set of index servers).
Just incorporating the time dimension is already messy, even before accounting for the fact that the metadata carried with along the artifacts is designed to be independent of the particular server that happens to be hosting it.
Tangent: if anyone is looking for an open source stack for working with distributed graph storage manipulation from Python, the combination of http://janusgraph.org/ and https://pypi.org/project/gremlinpython/ is well worth a look ;)
The equivalent for PEP 426 would probably be legacy-to-pydist and pydist-to-legacy converters that setuptools, bdist_wheel and other publishing tools can use to ship legacy metadata alongside the standardised format (and I believe Daniel already has at least the former in order to generate metadata.json in bdist_wheel). With PEP 426 as currently written, a pydist-to-legacy converter isn't really feasible, since pydist proposes new concepts that can't be readily represented in the old format.
pydist-to-legacy would be a lossy transformation.
Given appropriate use of the "extras" system and a couple of new METADATA fields, it doesn't have to be, at least for the initial version - that's the new design constraint I'm proposing for everything that isn't defined as a metadata extension.
The rationale being that if legacy dependency metadata can be reliably generated from the new format, that creates an incentive for *new* tools to adopt it ("generate the new format, get the legacy formats for free"), while also offering a clear migration path for existing publishing tools (refactor their metadata generation to produce the new format only, then derive the legacy metadata files from that) and consumption tools (consume the new fields immediately, look at consuming the new files later).
I understand that social reasons are often more important than technical reasons when it comes to success or failure of an approach; I'm just not sure that in this case, it wasn't given up on too early.
I think of PEP 426 as "deferred indefinitely pending specific practical problems to provide clearer design constraints" rather than abandoned :)
Is it too late to request lowercased property names without dashes?
That's already the case in PEP 426 as far as I know.
class PackageMetadata def __init__(): self.data = collections.OrderedDict() @staticmethod def read_legacy() def read_metadata_json() def read_pydist_json() def read_pyproject_toml() def read_jsonld()
def to_legacy(): def to_metadata_json() def to_pydist_json() def to_pyproject_toml() def to_jsonld() @classmethod def Legacy() def MetadataJson() def PydistJson() def PyprojectToml() def Jsonld(cls, *args, **kwargs) obj = cls(*args, **kwargs) obj.read_jsonld(*args, **kwargs) return obj @classmethod def from(cls, path,
format='legacy|metadatajson|pydistjson|pyprojecttoml|jsonld'): # or this
... for maximum reusability, we really shouldn't need an adapter registry here;
I'm not really worried about the Python API at this point, I'm interested in the isomorphism of the data formats to help streamline the migration (as that's the current main problem with PEP 426).
But yes, just as packaging grew "LegacyVersion" *after* PEP 440 defined the strict forward looking semantics, it will likely grow some additional tools for reading and converting the legacy formats once there's a clear pydist.json specification to document the semantics of the translated fields.
- the new pipenv project to provide a simpler alternative to the
pip+virtualenv+pip-tools combination for environment management in web service development (and similar layered application architectures). As with the "install vs setup" split in setuptools, pipenv settled on an "only two kinds of requirement (deployment and development)" model for usability reasons, but it also distinguishes abstract dependencies stored in Pipfile from pinned concrete dependencies stored in Pipfile.lock.
Does the Pipfile/Pipfile.lock distinction overlap with 'integrates' as a replacement for meta_requires?
Somewhat - the difference is that where the concrete dependencies in Pipfile.lock are derived from the abstract dependencies in Pipfile, the separation in pydist.json would be a declaration of "Yes, I really did mean to publish this with a concrete dependency, it's not an accident".
If we put those together with the existing interest in automating generation of policy compliant operating system distribution packages,
Downstream OS packaging could easily (and without permission) include extra attributes (properties specified with full URIS) in JSONLD metadata.
We can already drop arbitrary files into dist-info directories if we really want to, but in practice that extra metadata tends to end up in the system level package database rather than in the Python metadata.
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or direct URL references, rather than a general PEP 508 specifier as a string)
Pipfile.lock?
What happens here when something is listed in both requires and integrates?
Simplest would be to treat it the same way that tools treat mentioning the same component in multiple requirements entries (since that's really what you'd be doing).
Where/do these get merged on the "name" attr as a key, given a presumed namespace URI prefix (https://pypi.org/project/)?
For installation purposes, they'd be combined into a single requirements set.
For converting old metadata, any concrete dependencies that are compatible with the "integrates" field format would be mapped that way, while everything else would be converted to "requires" entries.
What heuristic would help identify compatibility with the integrates field?
PEP 440 version matching (==), arbitrary equality (===), and direct references (@...), with the latter being disallowed on PyPI (but fine when using a private index server).
The semantic differences between normal runtime dependencies and "dev", "test", "doc" and "build" requirements would be handled as extras, regardless of whether you were using the old metadata format or the new one.
+1 from me.
I can't recall whether I've used {"dev", "test", "doc", and "build"} as extras names in the past; though I can remember thinking "wouldn't it be more intuitive to do it [that way]"
Is this backward compatible? Extras still work as extras?
Yeah, this is essentially the way Provide-Extra ended up being documented in https://packaging.python.org/specifications/#provides-extra-multiple-use
That already specifies the expected semantics for "test" and "doc", so it would be a matter of adding "dev" and "build" (as well as surveying PyPI for components that already defined those extras)
P.S. I'm definitely open to a PR that amends the PEP 426 draft along these lines. I'll get to it eventually myself, but there are some other things I see as higher priority for my open source time at the moment (specifically the C locale handling behaviour of Python 3.6 in Fedora 26 and the related upstream proposal for Python 3.7 in PEP 538)
I need to find a job; my time commitment here is inconsistent.
Yeah, I assume work takes precedence for everyone, which is why I spend time needling redistributors and major end users about the disparity between "level of use" and "level of investment" when it comes to the upstream Python packaging ecosystem. While progress on that front isn't particularly visible yet, the nature of the conversations are changing in a good
I'm working on a project (nbmeta) for generating, displaying, and embedding RDFa and JSONLD in Jupyter notebooks (w/ _repr_html_() and an OrderedDict) which should refresh the JSONLD @context-writing skills necessary to define the RDFS vocabulary we could/should have at https://schema.python.org/ .
I'm definitely open to ensuring the specs are RDF/JSONLD friendly, especially as some of the characteristics of that are beneficial in other kinds of mappings as well (e.g. lists-of-hash-maps-with-fixed-key-names are easier to work with than hash-maps-with-data-dependent-key-names for a whole lot of reasons).
- [ ] JSONLD PEP (<- PEP426)
- [ ] examples / test cases
- I've referenced IPython as an example package; are there other hard
test cases for python packaging metadata conversion? (i.e. one that uses every feature of each metadata format)?
PyObjC is my standard example for legitimate version pinning in a public project (it's a metapackage where each release just depends on particular versions of the individual components) django-mezzanine is one I like as a decent example of a reasonably large dependency tree for something that still falls short of a complete application setuptools is a decent example for basic use of environment markers
I haven't found great examples for defining lots of extras or using complex environment marker options (but I also haven't really gone looking)
- [ ] JSONLD @context
- [ ] class PackageMetadata
- [ ] wheel: (additionally) generate JSONLD metadata
- [ ] schema.python.org: master, gh-pages (or e.g.
- [ ] warehouse: add a ./jsonld view (to elgacy?)
This definitely won't be an option for the legacy service, but it could be an interesting addition to Warehouse.
Cheers, Nick.
On Tue, Feb 14, 2017 at 12:15 PM, Nathaniel Smith njs@pobox.com wrote:
On Tue, Feb 14, 2017 at 10:10 AM, Vinay Sajip via Distutils-SIG distutils-sig@python.org wrote:
humpty in term uses uses distlib which seems to mishandle wheel metadata. (For example, it chokes if there's extra distribution meta and makes it impossible for buildout to install python-dateutil from a
wheel.)
I looked into the "mishandling". It's that the other tools don't adhere
to
[the current state of] PEP 426 as closely as distlib does. For example, wheel writes JSON metadata to metadata.json in the .dist-info directory, whereas PEP 426 calls for that data to be in pydist.json. The non-JSON metadata in the wheel (the METADATA file) does not strictly adhere to
any of
the metadata PEPs 241, 314, 345 or 426 (it has a mixture of incompatible fields).
I can change distlib to look for metadata.json, and relax the rules to be more liberal regarding which fields to accept, but adhering to the PEP
isn't
mishandling things, as I see it.
I thought the current status was that it's called metadata.json exactly *because* it's not standardized, and you *shouldn't* look at it?
It's too bad that the JSON thing didn't work out, but I think we're better off working on better specifying the one source of truth everything already uses (METADATA) instead of bringing in *new* partially-incompatible-and-poorly-specified formats.
JSON-LD
https://www.google.com/search?q=python+package+metadata+jsonld https://www.google.com/search?q=%22pep426jsonld"
PEP426 (Deferred)
Switching to a JSON compatible format https://www.python.org/dev/peps/pep-0426/#switching-to-a-json-compatible-for...
PEP 426: Define a JSON-LD context as part of the proposal https://github.com/pypa/interoperability-peps/issues/31
This doesn't work with JSON-LD 1.0:
```json releases = { "v0.0.1": {"url": ... }, "v1.0.0": {"url": ...}, }
This does work with JSON-LD 1.0:
```json releases = [ {"version": "v0.0.1", "url": ...}, {"version": "v1.0.0", "url": ...}, ]
... Then adding custom attributes could be as easy as defining a URI namespace and additional attribute names; because {distutils, setuptools, pip, pipenv(?)} only need to validate the properties necessary for the relevant packaging operation.
Without any JSON-LD normalization, these aren't equal:
{"url": "#here"} {"schema:url": "#here"} {"http://schema.org/url", "#here"}
This is the JSON downstream tools currently have/want to consume (en masse, for SAT solving, etc): https://pypi.python.org/pypi/ipython/json
- It's a graph. - JSON-LD is for graphs. - There are normalizations and signatures for JSON-LD (ld-signatures != JWS) - Downstream tools need not do anything with the @context. ("JSON-LD unaware") - Downstream tools which generate pydist.jsonld should validate schema in tests
Downstream tools: - https://github.com/pypa/pip/issues/988 "Pip needs a dependency resolver" (-> JSON) - https://github.com/pypa/warehouse/issues/1638 "API to get checksums" (-> JSON)
Q: How do we get this (platform and architecture-specific) metadata to warehouse, where it can be hosted?
A JSONLD entrypoint in warehouse (for each project, for every project, for {my_subset}): https://pypi.python.org/pypi/ipython/jsonld
I would accept a pull request to stop generating metadata.json in
bdist_wheel.
What about a pull request to start generating metadata.jsonld or pydist.jsonld instead?
- [ ] "@context": { }, - [ ] "@graph": { }, # optional
#PEP426JSONLD
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Feb 14, 2017, at 1:15 PM, Nathaniel Smith njs@pobox.com wrote:
It's too bad that the JSON thing didn't work out, but I think we're better off working on better specifying the one source of truth everything already uses (METADATA) instead of bringing in *new* partially-incompatible-and-poorly-specified formats.
TBH I don’t think we’re going to stick with METADATA forever and it’s likely we, at some point, get to a JSON representation for this information but that is not today. We have far more pressing issues to deal with besides whether things are in one format or another.
Yes, we still have a fair amount of behavior that is defined as “whatever setuptools/distutils does”, but we’re slowly trying to break away from that.
WRT to “standard implementations” versus “standards”, the idea of a “standard implementation” being the source of truth and no longer needing to do all the work to define standards is a nice idea, but I think it is an idea that is never actually going to work out as well as real standardization. There is *always* going to be a need for tools that aren’t the blessed tools to interact with these items. Even if you can authoritatively say that this one Python implementation is the only implementation that any Python program will ever need, there is still the problem that people need to consume this information in languages that aren’t Python. Another problem there is it becomes incredibly difficult to know what is something that is supported as an actual feature and what is something that just sort of works because of that way that something was implemented.
My goal with the packaging library is to more or less strictly implement accepted PEPs (and while I will make in progress PRs for PEPs that are currently being worked on, I won’t actually land a PR until the PEP is accepted). The only other real code there is extra utilities that make the realities of working with the specified PEPs easier (for example, we have a Version object which implements PEP 440 versions, but we also have a LegacyVersion object that implements what setuptools used to do).
This not only gives us the benefit of a single implementation for people who just want to use that single blessed implementation, but it gives us the benefit of standards. This has already been useful in the packaging library where an implementation defect caused versions to get parsed slightly wrong, and we had the extensively documented PEP 440 to declare what the expected behavior was.
I do not think the problem is "We've gotten so used to how pip and setuptools work, and because they are "good enough", there is a real failure of imagination to see how things might be done better.”. The hard work of doing this isn’t in writing an implementation that achieves it for 80% of projects, it’s for doing it in a way that achieves it for 95% of projects. Managing backwards compatibility is probably the single most important thing we can do here. There are almost 800,000 files on PyPI that someone can download and install, telling all of them they need to switch to some new system or things are going to break for them is simply not tenable. That being said, I don’t think there is anything stopping us from getting to a better point besides time and effort.
— Donald Stufft
On Tue, Feb 14, 2017 at 1:36 PM, Donald Stufft donald@stufft.io wrote:
WRT to “standard implementations” versus “standards”, the idea of a “standard implementation” being the source of truth and no longer needing to do all the work to define standards is a nice idea, but I think it is an idea that is never actually going to work out as well as real standardization. There is *always* going to be a need for tools that aren’t the blessed tools to interact with these items. Even if you can authoritatively say that this one Python implementation is the only implementation that any Python program will ever need, there is still the problem that people need to consume this information in languages that aren’t Python.
Another even more fundamental reason that standards are important is to document semantics. Like, distlib or packaging or whatever can expose the "provides" field, but what does that actually mean? As a user of distlib/packaging, how should I change what I'm doing when I see that field? As a package author when should I set it? (I'm intentionally picking an example where the answer is "well the PEP says something about this but in reality it was never implemented and maybe has some security issues and no-one really knows" :-).) A "standard implementation" can abstract away some things, but by definition these are mostly the boring bits...
-n
Managing backwards compatibility is probably the single most important thing we can do here. There are almost 800,000 files on PyPI that someone can download and install, telling all of them they need to switch to some new system or things are going to break for them is simply not tenable.
I agree. But if packaging is going at some point to break out of allowing completely bespoke code to run at installation time (i.e. executable code like a free-for-all setup.py, vs. something declarative and thus more restrictive) then IMO you have to sacrifice 100% backwards
compatibility. See my comment in my other post about the ability to install old releases - I made that a goal of my experiments with the parallel metadata, to not require anything other than a declarative setup() in order to be able to install stuff using just the metadata, so that nobody has to switch anything in a big-bang style, but could transition over to a newer system at their leisure.
Regards,
Vinay Sajip
On Tue, Feb 14, 2017 at 1:10 PM, Vinay Sajip vinay_sajip@yahoo.co.uk wrote:
humpty in term uses uses distlib which seems to mishandle wheel metadata. (For example, it chokes if there's extra distribution meta and makes it impossible for buildout to install python-dateutil from a
wheel.)
I looked into the "mishandling". It's that the other tools don't adhere to [the current state of] PEP 426 as closely as distlib does. For example, wheel writes JSON metadata to metadata.json in the .dist-info directory, whereas PEP 426 calls for that data to be in pydist.json. The non-JSON metadata in the wheel (the METADATA file) does not strictly adhere to any of the metadata PEPs 241, 314, 345 or 426 (it has a mixture of incompatible fields).
I can change distlib to look for metadata.json, and relax the rules to be more liberal regarding which fields to accept, but adhering to the PEP isn't mishandling things, as I see it.
Fair enough. Notice that I said "seems to". :-]
I suppose whether to be strict or not depends on use case. In my case, I was just trying to install a wheel as an egg, so permissive is definately what *I* want. Other use cases might want to be more strict.
Work on distlib has slowed right down since around the time when PEP 426 was deferred indefinitely, and there seems to be little interest in progressing via metadata or other standardisation - we have to go by what the de facto tools (setuptools, wheel) choose to do. It's not an ideal situation, and incompatibilities can crop up, as you've seen.
Nope. Honestly, though, I wish there was *one* *library* that defined the standard, which was the case for setuptools for a while (yeah, I know, the warts, really, I know) because I really don't think there's a desire to innovate or a reason for competition at this level. In the case of wheel, perhaps it makes sense for that implementation to be authoritative.
Thanks.
Jim
On 14 February 2017 at 18:36, Jim Fulton jim@jimfulton.info wrote:
I wish there was *one* *library* that defined the standard
packaging should be that library, but it doesn't cover metadata precisely because that PEP 426 hasn't been accepted (it doesn't try to cover the historical metadata 1.x standards, or "de facto" standards that aren't backed by a PEP AIUI).
Paul
Nope. Honestly, though, I wish there was *one* *library* that defined the standard, which was the case for setuptools for a while (yeah, I know, the warts, really, I know) because I really don't think there's a desire to innovate or a reason for competition at this level. In the case of wheel, perhaps it makes sense for that implementation to be authoritative.
The problem, to me, is not whether it is authoritative - it's more that it's ad hoc, just like setuptools in some areas. For example, the decision to use "metadata.json" rather than "pydist.json" is arbitrary, and could change in the future, and anyone who relies on how things work now will have to play catch-up when that happens. That's sometimes just too much work for volunteer activity - dig into what the problem is, put through a fix (for now), rinse and repeat - all the while, little or no value is really added.
In theory this is an "infrastructure" area where a single blessed implementation might be OK, but these de facto tools don't do everything one wants, so interoperability remains important. There's no reason why we shouldn't look to innovate even in this area - there's some talk of a GSoC project now to look at dependency resolution for pip - something that I had sort-of working in the distil tool long ago (as a proof of concept) [1]. We've gotten so used to how pip and setuptools work, and because they are "good enough", there is a real failure of imagination to see how things might be done better.
Regards,
Vinay Sajip
[1] https://distil.readthedocs.io/en/0.1.0/overview.html#actual-improvements
On Tue, Feb 14, 2017 at 2:40 PM, Vinay Sajip vinay_sajip@yahoo.co.uk wrote:
Nope. Honestly, though, I wish there was *one* *library* that defined
the standard,
which was the case for setuptools for a while (yeah, I know, the warts,
really, I know)
because I really don't think there's a desire to innovate or a reason
for competition
at this level. In the case of wheel, perhaps it makes sense for that
implementation to
be authoritative.
The problem, to me, is not whether it is authoritative - it's more that it's ad hoc, just like setuptools in some areas. For example, the decision to use "metadata.json" rather than "pydist.json" is arbitrary, and could change in the future, and anyone who relies on how things work now will have to play catch-up when that happens.
Unless they depend on a public API provided by the wheel package. Of course, you could argue that the name of a file could be part of the API.
In many ways, depending and building on a working implementation is better that drafting a standard from scratch.
Packaging has moved forward largely by people who built things pragmatically that worked and solved every-day problems: setuptools/easy_install, buildout, pip, wheel...
That's sometimes just too much work for volunteer activity - dig into what the problem is, put through a fix (for now), rinse and repeat - all the while, little or no value is really added.
In theory this is an "infrastructure" area where a single blessed implementation might be OK,
I think so.
but these de facto tools don't do everything one wants, so interoperability remains important.
Or collaboration to improve the tool. That *should* have worked for setuptools, but sadly didn't, for various reasons.
There's no reason why we shouldn't look to innovate even in this area - there's some talk of a GSoC project now to look at dependency resolution
Yay! (I saw that.)
for pip
Gaaaa. Why can't this be in a library? (Hopefully it will be.)
- something that I had sort-of working
in the distil tool long ago (as a proof of concept) [1].
Almost is a hard sell. If this was usable as a library, I'd be interested in trying to integrate it with buildout. If it worked, many buildout users would be greatful. Perhaps the GSoC project could use it as a reference or starting point.
We've gotten so used to how pip and
setuptools work, and because they are "good enough", there is a real
failure of imagination to see how things might be done better.
I think there is a failure of energy. Packaging should largely be boring and most people don't want to work on it. I certainly don't, even though I have.
But you picked a good example.
There are major differences (I almost said competition) between pip and buildout. They provide two different models (traditional Python system installs vs Java-like component/path installs) that address different use cases. IMO, these systems should complement each other and build on common foundations.
Maybe there are more cases for innovation at lower levels than I'm aware of.
Jim