Update PEP 508 to allow version specifiers

Hello! I hope I'm addressing this issue to the right place. Pradyun Gedam suggested [1] to open the discussion here. PEP 508 dependency specification [2] is generally considered to be replacement for dependency links. However, version specifiers are missing from PEP 508, so I can't specify: package >= 10.0 @ https://github.com/owner/package.git I can specify the exact version with tag: package @ https://github.com/owner/package.git@10.0 However, this is rather a hack than a real replacement. So, I'm proposing to update PEP 508 to allow version specifiers. The url_req rule [3] could be updated as follows: url_req = name wsp* extras? wsp* versionspec? wsp* urlspec wsp+ quoted_marker? However, there is an issue with overspecifying. What if I do this? package > 10.0 @ https://github.com/owner/package.git@10.0 Should this be considered an error? No such version available? I think so, but maybe it should be specified in the PEP as well? What do you think? I've never proposed any PEP update before, so there may be some procedures I'm missing. If so, please point me in the right direction. Kind regards, Jan Musílek [1] https://github.com/pypa/pip/issues/6162#issuecomment-458280052 [2] https://www.python.org/dev/peps/pep-0508/ [3] https://www.python.org/dev/peps/pep-0508/#grammar

On Tue, Jan 29, 2019 at 12:30 AM Jan Musílek <jan.musilek@nic.cz> wrote:
Hello! I hope I'm addressing this issue to the right place. Pradyun Gedam suggested [1] to open the discussion here.
PEP 508 dependency specification [2] is generally considered to be replacement for dependency links. However, version specifiers are missing from PEP 508, so I can't specify: package >= 10.0 @ https://github.com/owner/package.git
What would this do? I don't think there's any way for pip (or any program) to look at a git repo and extract a list of which revisions correspond to which python package versions. I guess you could have some hack where you guess that certain tag strings correspond to certain versions, but it seems pretty messy to me... -n -- Nathaniel J. Smith -- https://vorpus.org

Nathaniel Smith wrote:
What would this do? I don't think there's any way for pip (or any program) to look at a git repo and extract a list of which revisions correspond to which python package versions. I guess you could have some hack where you guess that certain tag strings correspond to certain versions, but it seems pretty messy to me...
Well, I'd like it to do the exact same think as dependency_links did before they were removed from pip. AFAIK it was possible to specify `package >= 10.0` in `install_requires` and then `https://github.com/owner/package.git#egg=package-0` in `depencency_links`. I don't really see into the pip internals, so I'm not sure how pip did it in the past. But it's a real issue [1]. [1] https://github.com/pypa/pip/issues/5898 JM

On Tue, 29 Jan 2019 at 08:50, Jan Musílek <jan.musilek@nic.cz> wrote:
Nathaniel Smith wrote:
What would this do? I don't think there's any way for pip (or any program) to look at a git repo and extract a list of which revisions correspond to which python package versions. I guess you could have some hack where you guess that certain tag strings correspond to certain versions, but it seems pretty messy to me...
Well, I'd like it to do the exact same think as dependency_links did before they were removed from pip. AFAIK it was possible to specify `package >= 10.0` in `install_requires` and then `https://github.com/owner/package.git#egg=package-0` in `depencency_links`. I don't really see into the pip internals, so I'm not sure how pip did it in the past. But it's a real issue [1].
If this is to be an extension to the current PEP, then *someone* is going to need to specify the semantics precisely. That's part of the problem - existing behaviours are implementation defined, and not specified clearly anywhere. The first step in working out a PEP update is to define those semantics so they can be discussed without vague statements like "needs to work like dependency_links does". I don't know how dependency_links worked, but URL links as defined in PEP 508 give "the URL to a specific artifact to install". So they link to one precise file, i.e. one precise version. So the only plausible semantics I can see for something like "package >= 10.0 @ https://github.com/owner/package.git" would be "Get the package at that URL. If it doesn't satisfy the version restriction "package >= 10.0", discard it. If it does satisfy that restriction, install it. Which doesn't seem that useful, IMO. Maybe the way to define useful semantics here would be to articulate the actual problem you're trying to solve (*without* referring to how dependency_links works) and propose a semantics that solves that problem? Paul

I’m wondering, why is it needed to specify both a version and a link? I assume the version specifier would be redundant when a link is provided as the source, since the link can only point to one possible package version. -- Tzu-ping Chung (@uranusjr) uranusjr@gmail.com Sent from my iPhone
On 29 Jan 2019, at 17:07, Paul Moore <p.f.moore@gmail.com> wrote:
On Tue, 29 Jan 2019 at 08:50, Jan Musílek <jan.musilek@nic.cz> wrote:
Nathaniel Smith wrote:
What would this do? I don't think there's any way for pip (or any program) to look at a git repo and extract a list of which revisions correspond to which python package versions. I guess you could have some hack where you guess that certain tag strings correspond to certain versions, but it seems pretty messy to me...
Well, I'd like it to do the exact same think as dependency_links did before they were removed from pip. AFAIK it was possible to specify `package >= 10.0` in `install_requires` and then `https://github.com/owner/package.git#egg=package-0` in `depencency_links`. I don't really see into the pip internals, so I'm not sure how pip did it in the past. But it's a real issue [1].
If this is to be an extension to the current PEP, then *someone* is going to need to specify the semantics precisely. That's part of the problem - existing behaviours are implementation defined, and not specified clearly anywhere. The first step in working out a PEP update is to define those semantics so they can be discussed without vague statements like "needs to work like dependency_links does".
I don't know how dependency_links worked, but URL links as defined in PEP 508 give "the URL to a specific artifact to install". So they link to one precise file, i.e. one precise version. So the only plausible semantics I can see for something like "package >= 10.0 @ https://github.com/owner/package.git" would be "Get the package at that URL. If it doesn't satisfy the version restriction "package >= 10.0", discard it. If it does satisfy that restriction, install it. Which doesn't seem that useful, IMO.
Maybe the way to define useful semantics here would be to articulate the actual problem you're trying to solve (*without* referring to how dependency_links works) and propose a semantics that solves that problem?
Paul -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/B7MZO...

On Tue, 29 Jan 2019 at 09:20, Tzu-ping Chung <uranusjr@gmail.com> wrote:
I’m wondering, why is it needed to specify both a version and a link? I assume the version specifier would be redundant when a link is provided as the source, since the link can only point to one possible package version.
From what I understand of the underlying use case, it's something along the lines of "we updated our git repository and pip didn't upgrade the package to the new version". So the problem here is likely that people are using URL references to point to something that changes over time, rather than to a "specific artifact" as PEP 508 specifies. It's quite possible that URL specifiers aren't the right solution here, but without a clear and fully specified requirement, it's nearly impossible to tell :-( Paul

Paul Moore wrote:
I’m wondering, why is it needed to specify both a version and a link? I assume the version specifier would be redundant when a link is provided as the source, since the link can only point to one possible package version. From what I understand of the underlying use case, it's something along the lines of "we updated our git repository and pip didn't upgrade the package to the new version". So the problem here is likely
On Tue, 29 Jan 2019 at 09:20, Tzu-ping Chung <uranusjr(a)gmail.com> wrote: that people are using URL references to point to something that changes over time, rather than to a "specific artifact" as PEP 508 specifies. It's quite possible that URL specifiers aren't the right solution here, but without a clear and fully specified requirement, it's nearly impossible to tell :-(
Well, yes, that's basically it. I don't think that there is anything wrong with PEP 508 pointing only at specific versions. BUT, it's widely proposed as replacement for dependency links, which it's clearly not because of this issue. If you think that PEP 508 should not be considered to be dependency links replacement, just say so and I can take this issue somewhere else (probably back to pip maintainers, asking them to revive dependency links until suitable replacement is proposed).

On Tue, 29 Jan 2019 at 09:51, Jan Musílek <jan.musilek@nic.cz> wrote:
Well, yes, that's basically it. I don't think that there is anything wrong with PEP 508 pointing only at specific versions. BUT, it's widely proposed as replacement for dependency links, which it's clearly not because of this issue.
OK, I think that it may well be in that case that URL specifiers don't satisfy that specific use case that dependency_links did[1]. But URL specifiers were *intended* to replace dependency_links, so if they don't do so then it's likely because users of dependency_links didn't successfully explain their requirements, and something got missed as a result.
If you think that PEP 508 should not be considered to be dependency links replacement, just say so and I can take this issue somewhere else.
It would be more accurate to say that PEP 508 is considered to be a replacement for dependency links in all of the use cases *that we were made aware of at the time*. Now that new use cases are being raised, we'll need to look at them. So I'd encourage you to continue this discussion here, where we can find a resolution that properly meets people's requirements. It may be that's (an extension of) URL specifiers, or it may be that something else is needed. But the first step has to be a well-defined standard that satisfies people's requirements. Dependency links have multiple problems, so simply continuing to support them, or trying to write a PEP that just says "do exactly what dependency_links do" isn't going to resolve this issue.
(probably back to pip maintainers, asking them to revive dependency links until suitable replacement is proposed).
I'm a pip maintainer, and the other maintainers monitor this list, so your comments are being heard here, no particular need to split the conversation over 2 different places :-) Paul [1] It's quite conceivable that there are other cases like this, too. But until people hitting problems are willing to discuss requirements and potential approaches to a solution, in the same way that you're doing here, we can't really say anything more.

On Tue, 29 Jan 2019 at 20:09, Paul Moore <p.f.moore@gmail.com> wrote:
On Tue, 29 Jan 2019 at 09:51, Jan Musílek <jan.musilek@nic.cz> wrote:
Well, yes, that's basically it. I don't think that there is anything wrong with PEP 508 pointing only at specific versions. BUT, it's widely proposed as replacement for dependency links, which it's clearly not because of this issue.
OK, I think that it may well be in that case that URL specifiers don't satisfy that specific use case that dependency_links did[1]. But URL specifiers were *intended* to replace dependency_links, so if they don't do so then it's likely because users of dependency_links didn't successfully explain their requirements, and something got missed as a result.
It wasn't an accident - the design of dependency links lets arbitrary packages in your dependency tree send your installer off to spider random sites on the internet for packages, and then when those sites break, your installation breaks. As a package consumer, when dependency links are enabled, you have no idea what servers your install process is actually going to go off and talk to, even if you specify `--binary-only :all:` to prevent local execution of setup.py scripts. It's essentially the same problem that https://www.python.org/dev/peps/pep-0470/ eliminated at the PyPI level. So URL specifiers replaced the part of dependency links that we actually wanted to keep: letting projects *temporarily* depend on VCS repos and other URLs while waiting for a release containing the feature that they needed, while focusing on abstract dependencies outside those cases (and deliberately eliminating the ability to add arbitrary new repositories to the dependency resolution process). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
So URL specifiers replaced the part of dependency links that we actually wanted to keep: letting projects *temporarily* depend on VCS repos and other URLs while waiting for a release containing the feature that they needed, while focusing on abstract dependencies outside those cases (and deliberately eliminating the ability to add arbitrary new repositories to the dependency resolution process).
Thank you for the explanation! This makes a lot of sense for public packages. However, when you work with private packages, it's not as straightforward. You can setup your own warehouse, but that's a lot of overhead for small personal projects. What would you recommend to use in this case? Cheers, Jan

I'm not sure I agree that setting up a private repository is that hard. Devpi does an excellent job for open source stuff, or otherwise if you use Artifactory inside your company that also has similar plugin. On Tue, Jan 29, 2019 at 1:30 PM Jan Musílek <jan.musilek@nic.cz> wrote:
Nick Coghlan wrote:
So URL specifiers replaced the part of dependency links that we actually wanted to keep: letting projects *temporarily* depend on VCS repos and other URLs while waiting for a release containing the feature that they needed, while focusing on abstract dependencies outside those cases (and deliberately eliminating the ability to add arbitrary new repositories to the dependency resolution process).
Thank you for the explanation! This makes a lot of sense for public packages.
However, when you work with private packages, it's not as straightforward. You can setup your own warehouse, but that's a lot of overhead for small personal projects. What would you recommend to use in this case?
Cheers, Jan -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/PRLJL...

You can also just stick files in a directory and use pretty much any web server that generates an automatic index from it.
On Jan 29, 2019, at 8:35 AM, Bernat Gabor <gaborjbernat@gmail.com> wrote:
I'm not sure I agree that setting up a private repository is that hard. Devpi does an excellent job for open source stuff, or otherwise if you use Artifactory inside your company that also has similar plugin.
On Tue, Jan 29, 2019 at 1:30 PM Jan Musílek <jan.musilek@nic.cz <mailto:jan.musilek@nic.cz>> wrote: Nick Coghlan wrote:
So URL specifiers replaced the part of dependency links that we actually wanted to keep: letting projects *temporarily* depend on VCS repos and other URLs while waiting for a release containing the feature that they needed, while focusing on abstract dependencies outside those cases (and deliberately eliminating the ability to add arbitrary new repositories to the dependency resolution process).
Thank you for the explanation! This makes a lot of sense for public packages.
However, when you work with private packages, it's not as straightforward. You can setup your own warehouse, but that's a lot of overhead for small personal projects. What would you recommend to use in this case?
Cheers, Jan -- Distutils-SIG mailing list -- distutils-sig@python.org <mailto:distutils-sig@python.org> To unsubscribe send an email to distutils-sig-leave@python.org <mailto:distutils-sig-leave@python.org> https://mail.python.org/mailman3/lists/distutils-sig.python.org/ <https://mail.python.org/mailman3/lists/distutils-sig.python.org/> Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/PRLJL... <https://mail.python.org/archives/list/distutils-sig@python.org/message/PRLJL...> -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/JJFCE...

On Tue, 29 Jan 2019 at 23:40, Donald Stufft <donald@stufft.io> wrote:
You can also just stick files in a directory and use pretty much any web server that generates an automatic index from it.
Including python3 -m http.server! (I wouldn't recommend using that on the public internet, but it can be very handy on a private intranet). Going back to the specific case of: "So, let's say that I have dependency "C @ https://github.com/owner/C.git@3" in project B and "C @ https://github.com/owner/C.git@1" in project A." The recommended way of handling that would be to: 1. change the dependency declarations to be "C >= 3" and "C >= 1" 2. use `pip wheel` to create a wheel file for project "C" yourself 3. drop that wheel file into a web directory or shared filesystem 4. add `-f <URL-for-the-location-in-step-3>` to your install commands If using `pipenv`, then step 4 can be automated by defining an additional `source` location in each project's Pipfile. Tools like devpi, pinrepo, Artifactory, etc can also be considered as potentially nicer ways of managing steps 2 & 3. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, 29 Jan 2019 at 23:29, Jan Musílek <jan.musilek@nic.cz> wrote:
Nick Coghlan wrote:
So URL specifiers replaced the part of dependency links that we actually wanted to keep: letting projects *temporarily* depend on VCS repos and other URLs while waiting for a release containing the feature that they needed, while focusing on abstract dependencies outside those cases (and deliberately eliminating the ability to add arbitrary new repositories to the dependency resolution process).
Thank you for the explanation! This makes a lot of sense for public packages.
However, when you work with private packages, it's not as straightforward. You can setup your own warehouse, but that's a lot of overhead for small personal projects. What would you recommend to use in this case?
I mostly just use "-f /some/filesystem/path" to add a directory containing built packages to the normal dependency resolution process. (It does mean the install process has to be explicit about where the extra packages are going to be allowed to come from, but that's kinda the point of the change) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, 29 Jan 2019 at 13:16, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Tue, 29 Jan 2019 at 20:09, Paul Moore <p.f.moore@gmail.com> wrote:
OK, I think that it may well be in that case that URL specifiers don't satisfy that specific use case that dependency_links did[1]. But URL specifiers were *intended* to replace dependency_links, so if they don't do so then it's likely because users of dependency_links didn't successfully explain their requirements, and something got missed as a result.
It wasn't an accident - the design of dependency links lets arbitrary packages in your dependency tree send your installer off to spider random sites on the internet for packages, and then when those sites break, your installation breaks.
Thanks for the clarification Nick. I couldn't remember the details here, so wasn't able to be definite.
So URL specifiers replaced the part of dependency links that we actually wanted to keep:
And in the context of pip's deprecation process, that's quite important. By deprecating dependency links, we're not saying "... and here's a complete replacement". Rather it's a genuine deprecation, we're *removing* functionality. At some point, that message got lost. There's a lesson here for pip, in that we've not communicated that intention very well (users clearly expect a replacement for *all* of the functionality of dependency links, and that URL specifiers are that replacement). And we've been extremely hesitant (to the detriment of the message) about clearly stating the "bad news". Hopefully we can learn from that mistake (but whether our users will be keen on us learning to "be firmer about sticking to our guns when removing functionality" is not obvious ;-)) Also, I don't think that users simply waiting for the pip developers to tell them how to modify their workflows to cater for the removal of dependency-links functionality is particularly helpful :-( Again, we could probably have been better at saying "this is your problem to solve, not ours" but nevertheless, the pip developers are responsible for providing a secure, stable tool, not for designing project workflows that use that tool. Paul

Thanks Jan for raising this issue. On Tue, Jan 29, 2019 at 10:21 AM Tzu-ping Chung <uranusjr@gmail.com> wrote:
I’m wondering, why is it needed to specify both a version and a link? I assume the version specifier would be redundant when a link is provided as the source, since the link can only point to one possible package version.
The same could be said from the package name: when a link is provided, the name is redundant since the link can only point to one possible package name. If a version specifier was allowed for direct references, this would the same thing: it would be the installer job to check that the provided link matches the provided version specifier, just like it should be checking that the package name matches ;). If the direct references was inconsistent, the installer could print a warning or abort the installation. Currently with pip 19, an inconsistent name in a direct reference only produces a warning. With a setup.py containing from setuptools import setup setup(name='foo', version='1', install_requires=['toto @ https://files.pythonhosted.org/packages/06/18/fa675aa501e11d6d6ca0ae73a101b2...' ]) "pip install . --no-cache-dir" ends up (unhappily & with a few warnings) with mccabe 0.6.1 installed.

On Jan 29, 2019, at 9:02 AM, Xavier Fernandez <xav.fernandez@gmail.com> wrote:
Thanks Jan for raising this issue.
On Tue, Jan 29, 2019 at 10:21 AM Tzu-ping Chung <uranusjr@gmail.com <mailto:uranusjr@gmail.com>> wrote: I’m wondering, why is it needed to specify both a version and a link? I assume the version specifier would be redundant when a link is provided as the source, since the link can only point to one possible package version.
The same could be said from the package name: when a link is provided, the name is redundant since the link can only point to one possible package name.
If a version specifier was allowed for direct references, this would the same thing: it would be the installer job to check that the provided link matches the provided version specifier, just like it should be checking that the package name matches ;). If the direct references was inconsistent, the installer could print a warning or abort the installation.
I don’t think that’s accurate, it’s not really the redundancy that’s the issue here (although it is silly). Conceptually a dependency declaration is 2 things, the name of the dependency, and a range of acceptable versions. In the old, dependency_links days you declared your dependency as normal (foo >= 1.0), but if that didn’t exist on PyPI (or wherever your installer was configured to look at), you could add a dependency link that acted as a hint to some other locations that this dependency might exist at. Thus having “3” pieces of the puzzle made sense here, you had the name and version specifier that indicated what dependencies you were looking for, and the dependency links that hinted where they might be found. In the new PEP 508 style, we’ve removed dependency links, and allowed you to say that the only acceptable version is the one located at some specific URL. Thus the URL itself is the version specifier, because conceptually, the only legal version is the one referred to at that URL. It still declares a name in the specifier because when doing the dependency resolution algorithm, it needs to know the name and the version, and conceptually the name and version of “foo @ https://example.com/foo.tar.gz <https://example.com/foo.tar.gz>” is (“foo”, “https://example.com/foo.tar.gz <https://example.com/foo.tar.gz>”). The two features are basically not 1:1, given that PEP 508 url specifier short circuits the URL discovery phase completely, whereas dependency_links only provide hints for additional places to look. Trying to cram version specifiers into the PEP 508 style syntax is not going to work as people expect, because it is conceptually nonsensical. However, one thing we *could* maybe do is allow including a specific version. So instead of treating the URL as the version you would provide the name, version, and url. So you’d have something like "foo-1.0 @ https://example.com/foo.tar.gz <https://example.com/foo.tar.gz>”. I don’t personally see much of a point to that, the version number doesn’t tell us any new information here, the only thing I think it *could* be useful for, is if you had something depending on “foo >= 1” and something else depending on “foo-1.0 @ …” then it would make it easier to determine there isn’t a dependency conflict.

On Tue, 29 Jan 2019 at 14:23, Donald Stufft <donald@stufft.io> wrote:
However, one thing we *could* maybe do is allow including a specific version. So instead of treating the URL as the version you would provide the name, version, and url. So you’d have something like "foo-1.0 @ https://example.com/foo.tar.gz”. I don’t personally see much of a point to that, the version number doesn’t tell us any new information here, the only thing I think it *could* be useful for, is if you had something depending on “foo >= 1” and something else depending on “foo-1.0 @ …” then it would make it easier to determine there isn’t a dependency conflict.
It could also error if when it downloaded and checked https://example.com/foo.tar.gz it found that it didn't have the name foo and the version 1.0. That would act as a sanity check that the file behind the URL hadn't changed. But it's still just a sanity check based on the user specifying redundant information, and nothing more. But direct URLs to github repos are a different matter, and are frankly just wrong - by their nature a github repo is a changing object, and so will never map to a "specific artifact to install". People using github URLs in URL specifiers seem to think that it means "check github every time and get me the latest version", which is not even close to what actually happens. And when they start adding version numbers to that (like the OP's package >= 10.0 @ https://github.com/owner/package.git) I can't even begin to understand what they think it might mean. (Hence my original request for a better explanation of the expected semantics!) Paul

On Jan 29, 2019, at 9:43 AM, Paul Moore <p.f.moore@gmail.com> wrote:
But direct URLs to github repos are a different matter, and are frankly just wrong - by their nature a github repo is a changing object, and so will never map to a "specific artifact to install".
FWIW, Paul’s statement is supported by PEP 440 itself. PEP 440 states: ----- All direct references that do not refer to a local file URL SHOULD specify a secure transport mechanism (such as https) AND include an expected hash value in the URL for verification purposes. If a direct reference is specified without any hash information, with hash information that the tool doesn't understand, or with a selected hash algorithm that the tool considers too weak to trust, automated tools SHOULD at least emit a warning and MAY refuse to rely on the URL. ----- Which clearly suggests that the URLs are expected to be immutable (given that tooling should at least emit a warning if a hash isn’t included, and are permitted to error completely, and you can’t have a hash unless the target URL is immutable.

On Tue, Jan 29, 2019, at 2:43 PM, Paul Moore wrote:
And when they start adding version numbers to that (like the OP's package >= 10.0 @ https://github.com/owner/package.git) I can't even begin to understand what they think it might mean.
I presume the idea is that it uses that repository like an extra index for that specific package, as if you had done: pip install --find-links https://github.com/owner/package.git package That doesn't work because pip doesn't have the machinery to identify Python package releases in a GitHub repository, but it's not hard to imagine how it could work if you assume tags follow a standard pattern (I'm not saying it should do that). If I've understood this correctly, the idea can be broken down into two parts: 1. Should it be possible to specify an extra index URL for a specific package? If a package itself can specify an index URL for its dependencies, you're back to the problems of crawling random sites. If this is only usable at the top level of packages you're asking pip to install, it's probably of limited use. And there are obvious complications: do you use the same index for dependencies of that package? What about if it shares dependencies with other top-level packages? 2. Should tools like pip be able to find a list of releases in a git (or specifically GitHub) repository? I think this would be too complicated to make a good standard - how do you distinguish 'release of a Python package' tags from any other tags that might be in the repository? Someone could make an installer tool that works with git repositories like this, assuming some convention of tags and file layout. But I'd vote against pip doing that, because anything that pip implements, every other installer tool will inevitably be pushed to implement for compatibility. Thomas

On Tue, 29 Jan 2019 at 15:07, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Tue, Jan 29, 2019, at 2:43 PM, Paul Moore wrote:
And when they start adding version numbers to that (like the OP's package >= 10.0 @ https://github.com/owner/package.git) I can't even begin to understand what they think it might mean.
I presume the idea is that it uses that repository like an extra index for that specific package, as if you had done:
pip install --find-links https://github.com/owner/package.git package
That doesn't work because pip doesn't have the machinery to identify Python package releases in a GitHub repository, but it's not hard to imagine how it could work if you assume tags follow a standard pattern (I'm not saying it should do that).
Precisely.
If I've understood this correctly, the idea can be broken down into two parts:
1. Should it be possible to specify an extra index URL for a specific package?
If a package itself can specify an index URL for its dependencies, you're back to the problems of crawling random sites. If this is only usable at the top level of packages you're asking pip to install, it's probably of limited use. And there are obvious complications: do you use the same index for dependencies of that package? What about if it shares dependencies with other top-level packages?
Again, precisely. Pip already has mechanisms (find-links and extra index URL) to do this at the top level, and letting projects do it in their config is basically a remote code exploit (admittedly install from source is already that).
2. Should tools like pip be able to find a list of releases in a git (or specifically GitHub) repository?
I think this would be too complicated to make a good standard - how do you distinguish 'release of a Python package' tags from any other tags that might be in the repository?
Someone could make an installer tool that works with git repositories like this, assuming some convention of tags and file layout. But I'd vote against pip doing that, because anything that pip implements, every other installer tool will inevitably be pushed to implement for compatibility.
Someone could make a front-end application that, given a github URL, presented the content of that repository as a PEP 503 index. That's pretty much the purpose of standardising the repository API in a PEP. And that application could include as much configurability and intelligence as its users desired. It would also work identically in any standards-conforming installer. Paul

Thank you all for very helpful comments! I've carefully considered everything you said and I'm now mostly convinced, that PEP 508 actually doesn't need to be expanded to include version specifiers. Sure, if it would, some workflows will continue to work more or less as they were. But it's not a good excuse to introduce something as hacky as going through tags in cloned git repository and trying to match them against some expected patterns. Cheers, Jan

On Tue, 29 Jan 2019 at 15:49, Jan Musílek <jan.musilek@nic.cz> wrote:
Thank you all for very helpful comments!
Thank you for bringing this up.
I've carefully considered everything you said and I'm now mostly convinced, that PEP 508 actually doesn't need to be expanded to include version specifiers. Sure, if it would, some workflows will continue to work more or less as they were. But it's not a good excuse to introduce something as hacky as going through tags in cloned git repository and trying to match them against some expected patterns.
Do you think you'd be able to summarise the conclusions from this thread on the pip issue that prompted it? It would be nice if we could record it over there for people who are following the tracker issue, and an explanation from an "affected user" perspective may be more helpful to others following the issue than a(nother) statement from one of the pip devs. Paul

Do you think you'd be able to summarise the conclusions from this thread on the pip issue that prompted it? It would be nice if we could record it over there for people who are following the tracker issue, and an explanation from an "affected user" perspective may be more helpful to others following the issue than a(nother) statement from one of the pip devs.
Sure, I'll get to it. Jan

On 1/29/19 9:43 AM, Paul Moore wrote:
And when they start adding version numbers to that (like the OP's package >= 10.0 @ https://github.com/owner/package.git) I can't even begin to understand what they think it might mean. (Hence my original request for a better explanation of the expected semantics!)
One might construe the request as a requirement to search the 'releases' or 'tags' page of a github repository, using some to-be-defined heuristic to map the release / tag names onto versioned project names. It wouldn't work for the general case (tags / releases can have arbitrary names). Maybe *worse* is that the tags / releases can be removed / replaced. Tres. -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com

On Tue, Jan 29, 2019 at 3:22 PM Donald Stufft <donald@stufft.io> wrote:
It still declares a name in the specifier because when doing the dependency resolution algorithm, it needs to know the name and the version, and conceptually the name and version of “foo @ https://example.com/foo.tar.gz” is (“foo”, “https://example.com/foo.tar.gz ”).
I disagree that it *needs* the name: since the link is declared as a dependency, the installer will necessarily need to check/download it at some point to install it and could discover the package name at that point, just like it will discover the version at the same point. Providing the name in the direct reference is an optimization that ease the work of the installer and allowing to provide a version specifier could be an other one. The two features are basically not 1:1, given that PEP 508 url specifier
short circuits the URL discovery phase completely, whereas dependency_links only provide hints for additional places to look. Trying to cram version specifiers into the PEP 508 style syntax is not going to work as people expect, because it is conceptually nonsensical.
This, I totally agree.
However, one thing we *could* maybe do is allow including a specific version. So instead of treating the URL as the version you would provide the name, version, and url. So you’d have something like "foo-1.0 @ https://example.com/foo.tar.gz”. I don’t personally see much of a point to that, the version number doesn’t tell us any new information here, the only thing I think it *could* be useful for, is if you had something depending on “foo >= 1” and something else depending on “foo-1.0 @ …” then it would make it easier to determine there isn’t a dependency conflict.
Yes, or the other way round, directly spot a dependency conflict without downloading the link.

On Jan 29, 2019, at 9:48 AM, Xavier Fernandez <xav.fernandez@gmail.com> wrote:
I disagree that it *needs* the name: since the link is declared as a dependency, the installer will necessarily need to check/download it at some point to install it and could discover the package name at that point, just like it will discover the version at the same point. Providing the name in the direct reference is an optimization that ease the work of the installer and allowing to provide a version specifier could be an other one.
It needs the name to do that without downloading, which is ideally the direction we’re heading towards, that we can do as much work prior to downloading files as possible. A version specifier is something like “>=10”. It doesn’t make any sense to say “I depend explicitly on the version of foo that exists at this URL, and also that URL has to be >= 10”. You’re already telling us that you depend on that URL, the version specifier is completely nonsensical. But lifting the *version* (as opposed to a version specifier) that is expected to be at that URL into the dependency declaration is fine, it’s just baking more information into the specifier to make it easier to handle.

On Tue, Jan 29, 2019 at 3:58 PM Donald Stufft <donald@stufft.io> wrote:
A version specifier is something like “>=10”. It doesn’t make any sense to say “I depend explicitly on the version of foo that exists at this URL, and also that URL has to be >= 10”. You’re already telling us that you depend on that URL, the version specifier is completely nonsensical. But lifting the *version* (as opposed to a version specifier) that is expected to be at that URL into the dependency declaration is fine, it’s just baking more information into the specifier to make it easier to handle.
I agree that such specifier would make little sense but why add a new syntax "foo-1 @ url" when "foo==1 @ url" (where ==1 is a version specifier as defined in PEP 508) would perfectly fit the bill ?

On Jan 29, 2019, at 10:15 AM, Xavier Fernandez <xav.fernandez@gmail.com> wrote:
I agree that such specifier would make little sense but why add a new syntax "foo-1 @ url" when "foo==1 @ url" (where ==1 is a version specifier as defined in PEP 508) would perfectly fit the bill ?
Well foo-1 wouldn’t work great because you can’t disambiguate between (“foo”, “1”) and (“foo-1”, None). But the reason for using a different syntax would be so people didn’t confuse the concept with the idea of version specifiers, since >=1.0 doesn’t make sense, if we allow ==10, then people will assume they can add >=10.0 and willet confused when it doesn’t work. A different syntax side steps that issue.

On 29 Jan 2019, at 23:19, Donald Stufft <donald@stufft.io> wrote:
On Jan 29, 2019, at 10:15 AM, Xavier Fernandez <xav.fernandez@gmail.com> wrote:
I agree that such specifier would make little sense but why add a new syntax "foo-1 @ url" when "foo==1 @ url" (where ==1 is a version specifier as defined in PEP 508) would perfectly fit the bill ?
Well foo-1 wouldn’t work great because you can’t disambiguate between (“foo”, “1”) and (“foo-1”, None). But the reason for using a different syntax would be so people didn’t confuse the concept with the idea of version specifiers, since >=1.0 doesn’t make sense, if we allow ==10, then people will assume they can add >=10.0 and willet confused when it doesn’t work. A different syntax side steps that issue.
A >=10.0 specifier could still work, I think. Resolvers are implemented to calculate the union of specifiers, so any specifiers would do. Of course it does not make perfect sense, but I guess it could be interpreted as “you can find the package here, and I promise it satisfies the given specifiers. TP
-- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/CTQH6...

but I guess it could be interpreted as “you can find the package here, and I promise it satisfies the given specifiers
Promising to satisfy a vague constraint isn’t very useful though, is it? Since it’s not providing a strict pin you aren’t even getting enough information to leverage the local cache until you actually download the contents, even in the best case The only useful case I can think of is this one: · I make a new package which declares two dependencies, for example, and lets say both are on PyPI – ‘Dep A’ and ‘Dep B’ o ‘Dep A’ has direct dependency “foo>=1.0@ git+https://github.com/bar/foo.git@some-ref” o ‘Dep B’ has direct dependency “foo>=1.5@ git+https://github.com/bar/foo.git@some-other-ref” · A resolver _could_ use this information to determine that actually, it’s correct to use the version declared by ‘Dep B’ · This highlights the important distinction here: just like with normal packages, the specifier is still being used to dictate which versions of foo would satisfy the requirement, _not the version that is currently found at the given url_ · Similarly when I specify a constraint such as “setuptools>=38.2” I don’t know which version I will get when I perform the installation or resolution, but I will be able to step through the resolution and compare other appearances of “setuptools” · This starts to feel like the question I asked in the other thread—what happens if you supply a specifier and we find a better match on PyPI? o I feel like this complication starts to unravel the purpose of a direct url; it should not be compared – I’m guessing we’d want to either fail resolution or privilege the URL if the downstream constraint isn’t satisfied by the provided direct url · So that leaves me with -- specifiers may be useful for comparison but _only_ for comparing direct url subdependencies Dan Ryan gh: @techalchemy <https://github.com/techalchemy> // e: dan@danryan.co From: Tzu-ping Chung [mailto:uranusjr@gmail.com] Sent: Tuesday, January 29, 2019 11:56 PM To: Donald Stufft Cc: Distutils Subject: [Distutils] Re: Update PEP 508 to allow version specifiers On 29 Jan 2019, at 23:19, Donald Stufft <donald@stufft.io> wrote: On Jan 29, 2019, at 10:15 AM, Xavier Fernandez <xav.fernandez@gmail.com> wrote: I agree that such specifier would make little sense but why add a new syntax "foo-1 @ url" when "foo==1 @ url" (where ==1 is a version specifier as defined in PEP 508) would perfectly fit the bill ? Well foo-1 wouldn’t work great because you can’t disambiguate between (“foo”, “1”) and (“foo-1”, None). But the reason for using a different syntax would be so people didn’t confuse the concept with the idea of version specifiers, since >=1.0 doesn’t make sense, if we allow ==10, then people will assume they can add >=10.0 and willet confused when it doesn’t work. A different syntax side steps that issue. A >=10.0 specifier could still work, I think. Resolvers are implemented to calculate the union of specifiers, so any specifiers would do. Of course it does not make perfect sense, but I guess it could be interpreted as “you can find the package here, and I promise it satisfies the given specifiers. TP -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/CTQH6...

On Tue, Jan 29, 2019, 06:59 Donald Stufft <donald@stufft.io wrote:
On Jan 29, 2019, at 9:48 AM, Xavier Fernandez <xav.fernandez@gmail.com> wrote:
I disagree that it *needs* the name: since the link is declared as a dependency, the installer will necessarily need to check/download it at some point to install it and could discover the package name at that point, just like it will discover the version at the same point. Providing the name in the direct reference is an optimization that ease the work of the installer and allowing to provide a version specifier could be an other one.
It needs the name to do that without downloading, which is ideally the direction we’re heading towards, that we can do as much work prior to downloading files as possible.
This confused me too, but after thinking about it I think maybe I get it. The thing is, there actually is no way for a resolver to know for certain whether it wants to download a direct URL like this until after it has downloaded it. Because, you could have a situation where after you download it, you discover that it has a version, or its own requirements, that are inconsistent with the other requirements we've gathered, and then the resolver has to backtrack. And in backtracking it might decide not to use that exact URL after all. But, we can imagine a resolver that works in phases: first, it uses all the information it can get without downloading/building anything to construct an optimist candidate solution: "assuming everything I haven't downloaded yet cooperates, this will work". And then we download/build until either everything works out, or else we discover some conflict and go back to phase 1 with more information in our database. If package A depends on B and C, and if package B depends on "C @ some-url"... well, we don't know until we download it whether there different C requirements are going to match up, it's true. But because the package name is present in the URL requirement, we at least know that we don't need to go find C somewhere *else*. If all we had was the URL without the name, then in this case we might end up downloading C from PyPI, spending time building it, and only after that download the URL and discover that it was also package C. -n

In the ideal future would we avoid the build step by having PyPI host primarily wheels? In which case anything available on PyPI would hopefully have its metadata exposed on the JSON endpoint and we could sidestep that. Either way we will ultimately have to download whatever is specified as a direct url dependency because even if it has a version that aligns we will need to figure out what dependencies it demands, etc etc. And while an optimistic candidate is a neat idea it’s not clear to me at least whether it’s a good idea to expect this of users. What happens when they get it wrong? Do you trust the version in the package or do you allow an override? Conflict resolution is possible either way but the desired behavior there would seem to favor the former imo Dan Ryan // pipenv maintainer gh: @techalchemy
On Jan 29, 2019, at 4:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Jan 29, 2019, 06:59 Donald Stufft <donald@stufft.io wrote:
On Jan 29, 2019, at 9:48 AM, Xavier Fernandez <xav.fernandez@gmail.com> wrote:
I disagree that it *needs* the name: since the link is declared as a dependency, the installer will necessarily need to check/download it at some point to install it and could discover the package name at that point, just like it will discover the version at the same point. Providing the name in the direct reference is an optimization that ease the work of the installer and allowing to provide a version specifier could be an other one.
It needs the name to do that without downloading, which is ideally the direction we’re heading towards, that we can do as much work prior to downloading files as possible.
This confused me too, but after thinking about it I think maybe I get it.
The thing is, there actually is no way for a resolver to know for certain whether it wants to download a direct URL like this until after it has downloaded it. Because, you could have a situation where after you download it, you discover that it has a version, or its own requirements, that are inconsistent with the other requirements we've gathered, and then the resolver has to backtrack. And in backtracking it might decide not to use that exact URL after all.
But, we can imagine a resolver that works in phases: first, it uses all the information it can get without downloading/building anything to construct an optimist candidate solution: "assuming everything I haven't downloaded yet cooperates, this will work". And then we download/build until either everything works out, or else we discover some conflict and go back to phase 1 with more information in our database.
If package A depends on B and C, and if package B depends on "C @ some-url"... well, we don't know until we download it whether there different C requirements are going to match up, it's true. But because the package name is present in the URL requirement, we at least know that we don't need to go find C somewhere *else*. If all we had was the URL without the name, then in this case we might end up downloading C from PyPI, spending time building it, and only after that download the URL and discover that it was also package C.
-n -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/Z3VOS...

On Tue, Jan 29, 2019 at 4:11 PM Dan Ryan <dan@danryan.co> wrote:
In the ideal future would we avoid the build step by having PyPI host primarily wheels? In which case anything available on PyPI would hopefully have its metadata exposed on the JSON endpoint and we could sidestep that.
Either way we will ultimately have to download whatever is specified as a direct url dependency because even if it has a version that aligns we will need to figure out what dependencies it demands, etc etc. And while an optimistic candidate is a neat idea it’s not clear to me at least whether it’s a good idea to expect this of users. What happens when they get it wrong? Do you trust the version in the package or do you allow an override? Conflict resolution is possible either way but the desired behavior there would seem to favor the former imo
Everything I was talking about was stuff that would be happening inside the resolver loop, no users involved. The "optimistic candidate" would just be an initial tentative solution that the resolver uses internally to decide which things to try downloading first. You know more about how resolvers actually work than I do though, so I might have gotten it wrong :-). -n -- Nathaniel J. Smith -- https://vorpus.org

Ah sorry! I was under the impression the “optimistic candidate” was being proposed as one derived from a specifier supplied in the direct dependency line by the user. If it’s a first guess essentially, I think that’s how we operate anyway for the most part (at least in the future resolver) while we step through the dependency graph. My main question is whether you’d consider a dependency specified in pep508 form as satisfying other things that demand the given name even if, upon download, the version doesn’t indicate that. If not, it still feels like we need to download it first. Just because someone put a ref doesn’t mean they don’t update that ref’s content :/ Dan Ryan // pipenv maintainer gh: @techalchemy
On Jan 29, 2019, at 7:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Jan 29, 2019 at 4:11 PM Dan Ryan <dan@danryan.co> wrote:
In the ideal future would we avoid the build step by having PyPI host primarily wheels? In which case anything available on PyPI would hopefully have its metadata exposed on the JSON endpoint and we could sidestep that.
Either way we will ultimately have to download whatever is specified as a direct url dependency because even if it has a version that aligns we will need to figure out what dependencies it demands, etc etc. And while an optimistic candidate is a neat idea it’s not clear to me at least whether it’s a good idea to expect this of users. What happens when they get it wrong? Do you trust the version in the package or do you allow an override? Conflict resolution is possible either way but the desired behavior there would seem to favor the former imo
Everything I was talking about was stuff that would be happening inside the resolver loop, no users involved. The "optimistic candidate" would just be an initial tentative solution that the resolver uses internally to decide which things to try downloading first.
You know more about how resolvers actually work than I do though, so I might have gotten it wrong :-).
-n
-- Nathaniel J. Smith -- https://vorpus.org

Given the confusion / lack of awareness around these issues, would it make sense for pip to point users towards an explanation and/or available work-arounds if pip sees that a user is trying to use the feature (e.g. if a user tries adding a version specifier or tries to use dependency links in general)? --Chris On Tue, Jan 29, 2019 at 5:08 PM Dan Ryan <dan@danryan.co> wrote:
Ah sorry! I was under the impression the “optimistic candidate” was being proposed as one derived from a specifier supplied in the direct dependency line by the user. If it’s a first guess essentially, I think that’s how we operate anyway for the most part (at least in the future resolver) while we step through the dependency graph.
My main question is whether you’d consider a dependency specified in pep508 form as satisfying other things that demand the given name even if, upon download, the version doesn’t indicate that. If not, it still feels like we need to download it first. Just because someone put a ref doesn’t mean they don’t update that ref’s content :/
Dan Ryan // pipenv maintainer gh: @techalchemy
On Jan 29, 2019, at 7:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Jan 29, 2019 at 4:11 PM Dan Ryan <dan@danryan.co> wrote:
In the ideal future would we avoid the build step by having PyPI host primarily wheels? In which case anything available on PyPI would hopefully have its metadata exposed on the JSON endpoint and we could sidestep that.
Either way we will ultimately have to download whatever is specified as a direct url dependency because even if it has a version that aligns we will need to figure out what dependencies it demands, etc etc. And while an optimistic candidate is a neat idea it’s not clear to me at least whether it’s a good idea to expect this of users. What happens when they get it wrong? Do you trust the version in the package or do you allow an override? Conflict resolution is possible either way but the desired behavior there would seem to favor the former imo
Everything I was talking about was stuff that would be happening inside the resolver loop, no users involved. The "optimistic candidate" would just be an initial tentative solution that the resolver uses internally to decide which things to try downloading first.
You know more about how resolvers actually work than I do though, so I might have gotten it wrong :-).
-n
-- Nathaniel J. Smith -- https://vorpus.org -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/QRKFG...

Paul Moore wrote:
Maybe the way to define useful semantics here would be to articulate the actual problem you're trying to solve (*without* referring to how dependency_links works) and propose a semantics that solves that problem?
Lets say that have project A, that has requirements B and C < 2. The project B has requirement C >= 1. So, any version of C between 1 and 2 satisfies A (deep) requirements. With url specifiers, I can't specify the version, I need to provide specific version in the link. So, let's say that I have dependency "C @ https://github.com/owner/C.git@3" in project B and "C @ https://github.com/owner/C.git@1" in project A. These are different links and so there is no way to find the version that satisfies both project A and project B requirements. Without version specifiers, I'd need to specify the exact same version in all projects (and if I'm using third party packages, I'm out of luck). I recognize that there is a problem that repository tag may not be the same as python package version. I don't know what to do about it yet.
participants (12)
-
Bernat Gabor
-
Chris Jerdonek
-
Dan Ryan
-
Donald Stufft
-
Jan Musílek
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Thomas Kluyver
-
Tres Seaver
-
Tzu-ping Chung
-
Xavier Fernandez