Hello! I hope everyone's had a great start to 2018! :) A few months back, while working on pip, I had noticed an oddity about extras. Installing a package with extras would not store information about the fact that the extras were requested. This means, later, it is not possible to know which extra-based optional dependencies of a package have to be considered when verifying that the packages are compatible with each other. This information is relavant for resolution/validation since without it, it is not possible to know which the extra-requirements to care about. As an example, installing ``requests[security]`` and then uninstalling ``PyOpenSSL`` leaves you in a state where you don't really satisfy what was asked for but there's no way to detect that either. Thus, obviously, I'm interested in making pip to be able to store this information. As I understand, this is done needs to be specified in a PEP and/or on PyPUG's specification page. To that end, here's seeding proposal for the discussion: a new `extras-requested.txt` file in the .dist-info directory, storing the extra names in a one-per-line format. Cheers! Pradyun
Pradyun Gedam kirjoitti 26.01.2018 klo 17:11:
Hello! I hope everyone's had a great start to 2018! :)
A few months back, while working on pip, I had noticed an oddity about extras.
Installing a package with extras would not store information about the fact that the extras were requested. This means, later, it is not possible to know which extra-based optional dependencies of a package have to be considered when verifying that the packages are compatible with each other. This information is relavant for resolution/validation since without it, it is not possible to know which the extra-requirements to care about.
As an example, installing ``requests[security]`` and then uninstalling ``PyOpenSSL`` leaves you in a state where you don't really satisfy what was asked for but there's no way to detect that either. What here is specific to extras really? "pip uninstall" does not consider dependencies anyway and will happily let you uninstall whatever you want, even if it's a dependency of some still installed distribution.
Thus, obviously, I'm interested in making pip to be able to store this information. As I understand, this is done needs to be specified in a PEP and/or on PyPUG's specification page.
To that end, here's seeding proposal for the discussion: a new `extras-requested.txt` file in the .dist-info directory, storing the extra names in a one-per-line format.
Cheers! Pradyun
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 26 January 2018 at 15:11, Pradyun Gedam
Installing a package with extras would not store information about the fact that the extras were requested. This means, later, it is not possible to know which extra-based optional dependencies of a package have to be considered when verifying that the packages are compatible with each other. This information is relavant for resolution/validation since without it, it is not possible to know which the extra-requirements to care about.
As an example, installing ``requests[security]`` and then uninstalling ``PyOpenSSL`` leaves you in a state where you don't really satisfy what was asked for but there's no way to detect that either.
1. pip uninstall doesn't check validity, so there's no issue for the uninstall 2. pip check needs this information if it's to complain, and I believe that's the key point in your question I think that if we want pip check to validate this situation, we need to store the data when we install. Where we store it needs to be decided - I'd assume it would go in the dist-info directory for requests somewhere, and that's the bit of metadata that needs agreeing. Is there any other place where current functionality needs this information, *apart* from pip check? Are there any proposed additional features that might need it?
Thus, obviously, I'm interested in making pip to be able to store this information. As I understand, this is done needs to be specified in a PEP and/or on PyPUG's specification page.
To that end, here's seeding proposal for the discussion: a new `extras-requested.txt` file in the .dist-info directory, storing the extra names in a one-per-line format.
Looks OK to me. But I don't know how important it is to satisfy this use case. I've never needed this feature (I don't think I've ever used pip check, and I've very rarely used extras) so I won't comment on that. Paul PS I know we talked a bit off-list about this and I said I didn't have any opinion. I'd misunderstood what you were suggesting a bit, mostly because the conversation veered off into uninstalling requests[security] and what that means... So now that you've restated the issue, I have a bit of an opinion (although still not much :-))
On Fri, 26 Jan 2018, 21:21 Paul Moore,
On 26 January 2018 at 15:11, Pradyun Gedam
wrote: Installing a package with extras would not store information about the fact that the extras were requested. This means, later, it is not possible to know which extra-based optional dependencies of a package have to be considered when verifying that the packages are compatible with each other. This information is relavant for resolution/validation since without it, it is not possible to know which the extra-requirements to care about.
As an example, installing ``requests[security]`` and then uninstalling ``PyOpenSSL`` leaves you in a state where you don't really satisfy what was asked for but there's no way to detect that either.
1. pip uninstall doesn't check validity, so there's no issue for the uninstall 2. pip check needs this information if it's to complain, and I believe that's the key point in your question
Indeed. Thanks for adding this clarification. I guess this answers Alex's question as well.
I think that if we want pip check to validate this situation, we need to store the data when we install. Where we store it needs to be decided - I'd assume it would go in the dist-info directory for requests somewhere, and that's the bit of metadata that needs agreeing.
Is there any other place where current functionality needs this information, *apart* from pip check? Are there any proposed additional features that might need it?
Yes. I currently have a branch where during pip install, after the packages to be installed are selected, it does a pip check on what would be the installed packages if the installation is successful. This would mean that if we do select an incompatible version for installation, pip install would be able to print a warning. In other words, a pip check would be done during every pip install run.
Thus, obviously, I'm interested in making pip to be able to store this
information. As I understand, this is done needs to be specified in a PEP and/or on PyPUG's specification page.
To that end, here's seeding proposal for the discussion: a new `extras-requested.txt` file in the .dist-info directory, storing the extra names in a one-per-line format.
Looks OK to me.
But I don't know how important it is to satisfy this use case. I've never needed this feature (I don't think I've ever used pip check, and I've very rarely used extras) so I won't comment on that.
Paul
PS I know we talked a bit off-list about this and I said I didn't have any opinion. I'd misunderstood what you were suggesting a bit, mostly because the conversation veered off into uninstalling requests[security] and what that means... So now that you've restated the issue, I have a bit of an opinion (although still not much :-))
Hehe. No problems. :)
On 26 January 2018 at 16:07, Pradyun Gedam
Is there any other place where current functionality needs this information, *apart* from pip check? Are there any proposed additional features that might need it?
Yes. I currently have a branch where during pip install, after the packages to be installed are selected, it does a pip check on what would be the installed packages if the installation is successful. This would mean that if we do select an incompatible version for installation, pip install would be able to print a warning.
I'd rather see us get a proper solver that would mean this should never happen :-) Paul
Resending coz I clicked the wrong button. On Fri, 26 Jan 2018 at 22:16 Paul Moore
Is there any other place where current functionality needs this information, *apart* from pip check? Are there any proposed additional features that might need it?
Yes. I currently have a branch where during pip install, after the
to be installed are selected, it does a pip check on what would be the installed packages if the installation is successful. This would mean
On 26 January 2018 at 16:07, Pradyun Gedam
wrote: packages that if we do select an incompatible version for installation, pip install would be able to print a warning.
I'd rather see us get a proper solver that would mean this should never happen :-)
Yep, I am working on that. This is more of a short-term thing that I'm thinking of, just in case the proper solver isn't ready in time for pip 10. :) Aside, I've just posted an update over at #988 -- https://github.com/pypa/pip/issues/988#issuecomment-360846457 -- about the state of the resolver work and the aforementioned branch.
Paul
-- Pradyun
On Fri, 26 Jan 2018 at 23:10 Pradyun Gedam
Resending coz I clicked the wrong button.
On Fri, 26 Jan 2018 at 22:16 Paul Moore
wrote: Is there any other place where current functionality needs this information, *apart* from pip check? Are there any proposed additional features that might need it?
Yes. I currently have a branch where during pip install, after the
to be installed are selected, it does a pip check on what would be the installed packages if the installation is successful. This would mean
On 26 January 2018 at 16:07, Pradyun Gedam
wrote: packages that if we do select an incompatible version for installation, pip install would be able to print a warning.
I'd rather see us get a proper solver that would mean this should never happen :-)
Yep, I am working on that. This is more of a short-term thing that I'm thinking of, just in case the proper solver isn't ready in time for pip 10. :)
And the resolver will need this extras-installed metadata as well.
Aside, I've just posted an update over at #988 -- https://github.com/pypa/pip/issues/988#issuecomment-360846457 -- about the state of the resolver work and the aforementioned branch.
Paul
--
Pradyun
On 27 January 2018 at 03:44, Pradyun Gedam
On Fri, 26 Jan 2018 at 23:10 Pradyun Gedam
wrote: Resending coz I clicked the wrong button.
On Fri, 26 Jan 2018 at 22:16 Paul Moore
wrote: Is there any other place where current functionality needs this information, *apart* from pip check? Are there any proposed additional features that might need it?
Yes. I currently have a branch where during pip install, after the
to be installed are selected, it does a pip check on what would be the installed packages if the installation is successful. This would mean
On 26 January 2018 at 16:07, Pradyun Gedam
wrote: packages that if we do select an incompatible version for installation, pip install would be able to print a warning.
I'd rather see us get a proper solver that would mean this should never happen :-)
Yep, I am working on that. This is more of a short-term thing that I'm thinking of, just in case the proper solver isn't ready in time for pip 10. :)
And the resolver will need this extras-installed metadata as well.
`pipenv lock` could likely also benefit from this (although in our case we can also check `Pipfile` to see if any extras have been explicitly requested).
From the point of view of post-install-metadata-readability, something else we don't currently record is the *result* of environment marker expressions at the time of installation.
This means that there's no particularly straightforward way to check the portability of an entire virtual environment, whereas if we recorded the environment marker expressions and their outcomes somewhere, then we'd be able to extract a consolidated list of the environmental expectations for that venv. Ideally, we'd also be able to record somewhere which dependencies had been satisfied from *outside* the venv, again with the goal of extracting a consolidated list of the expectations the venv is placing on the host Python environment. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jan 26, 2018 at 7:11 AM, Pradyun Gedam
Hello! I hope everyone's had a great start to 2018! :)
A few months back, while working on pip, I had noticed an oddity about extras.
Installing a package with extras would not store information about the fact that the extras were requested. This means, later, it is not possible to know which extra-based optional dependencies of a package have to be considered when verifying that the packages are compatible with each other. This information is relavant for resolution/validation since without it, it is not possible to know which the extra-requirements to care about.
As an example, installing ``requests[security]`` and then uninstalling ``PyOpenSSL`` leaves you in a state where you don't really satisfy what was asked for but there's no way to detect that either.
Another important use case is upgrades: if requests[security] v1 just depends on pyopenssl, and then requests[security] v2 adds a dependency on certifi, and I do pip install requests[security] == 1 pip upgrade then upgrade should give me requests[security] == 2, and thus install certifi. But this doesn't work if you don't have any record that 'requests[security]' is even installed :-).
Thus, obviously, I'm interested in making pip to be able to store this information. As I understand, this is done needs to be specified in a PEP and/or on PyPUG's specification page.
To that end, here's seeding proposal for the discussion: a new `extras-requested.txt` file in the .dist-info directory, storing the extra names in a one-per-line format.
I'm going to put in another plug here for my "reified extras" idea: https://mail.python.org/pipermail/distutils-sig/2015-October/027364.html Essentially, the idea is to promote extras to full packages -- normally ones that contain no files, just metadata like dependencies, though that's not a necessary requirement, it's just how we'd interpret existing extras specifications. Then installing 'requests[security]' would install the 'requests[security]' package, which depends on both 'requests' and 'pyopenssl', and we have a 'requests[security]-$VERSION.dist-info' directory recording that we installed it. The advantages are: - it's a simpler way to record information the information you want here, without adding more special cases to dist-info: most code doesn't even have to know what 'extras' are, just what packages are - it opens the door to lots of more advanced features, like 'foo[test]' being a package that actually contains foo's tests, or build variants like 'numpy[mkl]' being numpy built against the MKL library, or maybe making it possible to track which version of numpy's ABI different packages use. (The latter two cases need some kind of provides: support, which is impossible right now because we don't want to allow random-other-package to say 'provides-dist: cryptography'; but, it would be okay if 'numpy[mkl]' said 'provides-dist: numpy', because we know 'numpy[mkl]' and 'numpy' are maintained by the same people.) I know there's a lot of precedent for this kind of clever use of metadata-only packages in Debian (e.g. search for "metapackages"), and I guess the RPM world probably has similar tricks. -n -- Nathaniel J. Smith -- https://vorpus.org
On 27 January 2018 at 13:46, Nathaniel Smith
The advantages are:
- it's a simpler way to record information the information you want here, without adding more special cases to dist-info: most code doesn't even have to know what 'extras' are, just what packages are
- it opens the door to lots of more advanced features, like 'foo[test]' being a package that actually contains foo's tests, or build variants like 'numpy[mkl]' being numpy built against the MKL library, or maybe making it possible to track which version of numpy's ABI different packages use. (The latter two cases need some kind of provides: support, which is impossible right now because we don't want to allow random-other-package to say 'provides-dist: cryptography'; but, it would be okay if 'numpy[mkl]' said 'provides-dist: numpy', because we know 'numpy[mkl]' and 'numpy' are maintained by the same people.)
I know there's a lot of precedent for this kind of clever use of metadata-only packages in Debian (e.g. search for "metapackages"), and I guess the RPM world probably has similar tricks.
While I agree with this idea in principle, I'll note that RPM makes it relatively straightforward to have a single SRPM emit multiple RPMs, so defining a metapackage is just a few extra lines in a spec file. (I'm not sure how Debian's metapackages work, but I believe they're similarly simple on the publisher's side). We don't currently have a comparable mechanism to readily allow a single source project to expand to multiple package index entries that all share a common sdist, but include different subsets in their respective wheel files (defining one would definitely be possible, it's just a tricky migration problem to work out). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jan 26, 2018 at 8:37 PM, Nick Coghlan
On 27 January 2018 at 13:46, Nathaniel Smith
wrote: The advantages are:
- it's a simpler way to record information the information you want here, without adding more special cases to dist-info: most code doesn't even have to know what 'extras' are, just what packages are
- it opens the door to lots of more advanced features, like 'foo[test]' being a package that actually contains foo's tests, or build variants like 'numpy[mkl]' being numpy built against the MKL library, or maybe making it possible to track which version of numpy's ABI different packages use. (The latter two cases need some kind of provides: support, which is impossible right now because we don't want to allow random-other-package to say 'provides-dist: cryptography'; but, it would be okay if 'numpy[mkl]' said 'provides-dist: numpy', because we know 'numpy[mkl]' and 'numpy' are maintained by the same people.)
I know there's a lot of precedent for this kind of clever use of metadata-only packages in Debian (e.g. search for "metapackages"), and I guess the RPM world probably has similar tricks.
While I agree with this idea in principle, I'll note that RPM makes it relatively straightforward to have a single SRPM emit multiple RPMs, so defining a metapackage is just a few extra lines in a spec file. (I'm not sure how Debian's metapackages work, but I believe they're similarly simple on the publisher's side).
We don't currently have a comparable mechanism to readily allow a single source project to expand to multiple package index entries that all share a common sdist, but include different subsets in their respective wheel files (defining one would definitely be possible, it's just a tricky migration problem to work out).
Yeah, the migration is indeed the tricky part. Here's one possible approach. First, figure out what exactly an "extra" should become in the new system. I think it's: if package $PACKAGE version $VERSION defines an extra $EXTRA, then that corresponds to a wheel named "$PACKAGE[$EXTRA]" (the brackets become part of the package name), version $VERSION, and it has Requires-Dist: $PACKAGE = $VERSION, as well as whatever requirements were originally part of the extra. Now, if we didn't have to worry about migrations, we'd extend setuptools/bdist_wheel so that when they see the current syntax for defining an extra, they generate extra wheels following the formula above. (So 'setup.py bdist_wheel' generates N+1 wheels for a package with N extras.) And we'd teach PyPI that packages named like "$PACKAGE[$EXTRA]" should be collected together with packages named "$PACKAGE" (e.g. the same access control apply to both, and probably you want to display them together in the UI when their versions match). And we'd teach pip that square brackets are legal in package names. And that'd be about it. Of course, we do have to worry about migration, and in the first instance, what we care about is making pip's database of installed packages properly record these new wheels. So my proposal is: - Requirements like 'requests[security,socks]' need to be expanded to 'requests[security], requests[socks]'. More specifically, when pip processes a requirement like '$PACKAGE[$EXTRA1,$EXTRA2,...] $OP1 $VERSION1, $OP2 $VERSION2, ...', it expands it to multiple packages and then applies the constraints to each of them: ['$PACKAGE[$EXTRA1] $OP1 $VERSION1, $OP2 $VERSION2 ...', '$PACKAGE[$EXTRA2] $OP1 $VERSION1, $OP2 $VERSION2 ...', ...]. - When pip needs to find a wheel like 'requests[security]', then it first checks to see if this exact wheel (with the brackets) is available on PyPI (or whatever package sources it has available). If so, it uses that. If not, then it falls back to looking for a 'requests' wheel, and if it finds one, and that wheel has 'extra' metadata, then it *uses that metadata to generate a wheel on the spot*, and then carries on as if it had found it on PyPI. - Special case: when hash-checking mode is enabled and pip ends up doing this fallback, then pip always checks the hash against the wheel it found on PyPI – so 'requests[security] --hash=...' checks the hash of requests.whl, not the auto-generated requests[security].whl. (There is some question to discuss here about how sdists should be handled: in many cases, possibly all of them, it doesn't really make sense to have separate sdists for different square-bracket packages. 'requests[security]' will probably always be generated from the requests source tree, and for build variants like 'numpy[mkl]' you definitely want to build that from numpy.tar.gz, with some special flag telling it to use the "mkl" configuration. So maybe when pip fails to find a wheel it should always go straight to the un-adorned sdist like requests.tar.gz, instead of checking for requests[security].tar.gz. But this isn't going to make-or-break a v1 implementation.) If we implement just these two things, then I think that's enough for pip to start immediately tracking all the proper metadata for existing extras packages, and also provide a smooth onramp to the eventual features. Once this was working, we could enable uploading square-bracket packages to PyPI, and pip would start automatically picking them up where present. Then we could flip the switch for setuptools to start generating such packages. We'd probably also want to tweak PEP 517 so that the build backend is informed of exactly which package pip is looking for, to handle the case where numpy.tar.gz is expected to produce numpy[mkl].whl. And after that we could enable real Provides-Dist: and Provides-Dist: metadata... -n -- Nathaniel J. Smith -- https://vorpus.org
On 11 February 2018 at 19:17, Nathaniel Smith
- When pip needs to find a wheel like 'requests[security]', then it first checks to see if this exact wheel (with the brackets) is available on PyPI (or whatever package sources it has available). If so, it uses that. If not, then it falls back to looking for a 'requests' wheel, and if it finds one, and that wheel has 'extra' metadata, then it *uses that metadata to generate a wheel on the spot*, and then carries on as if it had found it on PyPI.
It occurs to me that this synthetic metadata generation aspect could be pursued first, since these "installed extras" pseudo-packages should be relatively straightforward to create: - dist-info directory derived from the corresponding install package name plus the extra name - a generated METADATA file using the "name[extra]" pseudo-name and amended dependency spec - an appropriate RECORD file - INSTALLER and REQUESTED files as defined in PEP 376 While we could potentially use a different suffix for the synthetic pseudo-packages (e.g. "{name}[{extra}]-{version}.extra-info"), such that anything looking only for "*.dist-info" packages would ignore that, we could also decide that the non-standard name structure was enough of a distinguishing marker. Once we had install-time generation of those, then we could *separately* consider if we wanted to allow extras to expand beyond their current "optional dependencies" capabilities to also cover the installation of additional files that aren't in the base distribution, without requiring those files to be moved out to a separate dependency. The benefit of that split approach is that the "Database of installed extras" aspect would only requires enhancements to pip and other installers, and to tools that read the installation metadata, while the full proposal would also require changes to PyPI and to publishing tools. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 27 Jan 2018 at 09:16 Nathaniel Smith
On Fri, Jan 26, 2018 at 7:11 AM, Pradyun Gedam
wrote: Hello! I hope everyone's had a great start to 2018! :)
A few months back, while working on pip, I had noticed an oddity about extras.
Installing a package with extras would not store information about the fact that the extras were requested. This means, later, it is not possible to know which extra-based optional dependencies of a package have to be considered when verifying that the packages are compatible with each other. This information is relavant for resolution/validation since without it, it is not possible to know which the extra-requirements to care about.
As an example, installing ``requests[security]`` and then uninstalling ``PyOpenSSL`` leaves you in a state where you don't really satisfy what was asked for but there's no way to detect that either.
Another important use case is upgrades: if requests[security] v1 just depends on pyopenssl, and then requests[security] v2 adds a dependency on certifi, and I do
pip install requests[security] == 1 pip upgrade
then upgrade should give me requests[security] == 2, and thus install certifi. But this doesn't work if you don't have any record that 'requests[security]' is even installed :-).
Yes! Essentially, if there's a situation where a package may be modified, we should care about having this information, to ensure it still does satisfy the extra's requirements which may change themselves when the base package changes.
Thus, obviously, I'm interested in making pip to be able to store this information. As I understand, this is done needs to be specified in a PEP and/or on PyPUG's specification page.
To that end, here's seeding proposal for the discussion: a new `extras-requested.txt` file in the .dist-info directory, storing the extra names in a one-per-line format.
I'm going to put in another plug here for my "reified extras" idea: https://mail.python.org/pipermail/distutils-sig/2015-October/027364.html
Essentially, the idea is to promote extras to full packages -- normally ones that contain no files, just metadata like dependencies, though that's not a necessary requirement, it's just how we'd interpret existing extras specifications.
Then installing 'requests[security]' would install the 'requests[security]' package, which depends on both 'requests' and 'pyopenssl', and we have a 'requests[security]-$VERSION.dist-info' directory recording that we installed it.
I like this. This is how I'm modelling extras within the resolver currently, by just considering extras as just-another-requirement and having them depend on the base package and the extra dependencies. Prof. Justin Cappos had suggested this to me. I imagine this'll result in simplification somewhere due to this consistency between what the resolver consumes and what's on the disk. I think if we go this way, we should probably aim to just something equivalent of Debian's metapackages for now. The rest of the advanced features can be brought in at a latter stage. The advantages are:
- it's a simpler way to record information the information you want here, without adding more special cases to dist-info: most code doesn't even have to know what 'extras' are, just what packages are
- it opens the door to lots of more advanced features, like 'foo[test]' being a package that actually contains foo's tests, or build variants like 'numpy[mkl]' being numpy built against the MKL library, or maybe making it possible to track which version of numpy's ABI different packages use. (The latter two cases need some kind of provides: support, which is impossible right now because we don't want to allow random-other-package to say 'provides-dist: cryptography'; but, it would be okay if 'numpy[mkl]' said 'provides-dist: numpy', because we know 'numpy[mkl]' and 'numpy' are maintained by the same people.)
I know there's a lot of precedent for this kind of clever use of metadata-only packages in Debian (e.g. search for "metapackages"), and I guess the RPM world probably has similar tricks.
-n
-- Nathaniel J. Smith -- https://vorpus.org
participants (5)
-
Alex Grönholm
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Pradyun Gedam