GSoC 2017 - Plan of Action for dependency resolver
Hello Everyone! Google released the list of accepted organizations for GSoC 2017 and PSF is one of them. I guess this would a good time for me to seek feedback on the approach I'm planning to take for my potential GSoC project. I hope this mailing list is the right place to do so. --- Here's my current plan of action along with reasoning for the choices made: A separate PR will be made for each of these stages. Every stage does depend on the previous ones being completed. 1. Refactor all dependency resolution responsibility in pip into a new, separate module. This would allow any future changes/improvements in the dependency resolution to be added without major changes in the rest of the code-base. As of today, the RequirementSet class within pip seems to be doing a lot of work and dependency resolution is a responsibility that doesn't need to given to it, especially when it's avoidable. 2. Implement dependency information caching. This would allow the resolver to not cause the re-computation of the dependencies of a package, if they have already been computed, speeding up the resolution. 3. Implement a backtracking resolver. A backtracking solver would be appropriate given that we don't have a way to pre-compute the dependencies for *all* the packages or statically determine the dependencies - a SAT solver would not be feasible. 4. (if time permits) Move any dependency resolution code out into a separate library. This would make it possible for other projects (like buildout or a future pip replacement) to reuse the dependency resolver. By making each of the stages separate PRs, incremental improvements would be made so that even if I leave this project midway, there will be some work merged already if someone comes back to this problem later. That said, I don't intend to leave this project midway. I do intend to reuse some of the work done by Robert Collins in PR #2716 on pip's GitHub repository. Stages 2 and 3 are separate because I see them as distinctly different tasks which touch very different portions of the code-base. There's is strong coupling between them though. I'm looking forward to the feedback. :) Regards, Pradyun
On Tue, Feb 28, 2017 at 10:14 AM, Pradyun Gedam
4. (if time permits) Move any dependency resolution code out into a separate library.
This would make it possible for other projects (like buildout or a future pip replacement) to reuse the dependency resolver.
Thank you! ... I do intend to reuse some of the work done by Robert Collins in PR #2716 on
pip's GitHub repository.
Are you aware of the proof of concept in distlib? https://distil.readthedocs.io/en/0.1.0/overview.html#actual-improvements Jim -- Jim Fulton http://jimfulton.info
On Tue, Feb 28, 2017, 21:18 Jim Fulton
On 1 March 2017 at 10:28, Pradyun Gedam
I haven't really understood how it gets the information about dependencies without downloading the packages... I'll give it another pass this weekend.
If I recall, it reads static dependency data held on the red-dove site and maintained by downloading and running egg-info on the packages as changes occur. I don't think it's a sustainable approach for pip at the moment (my understanding is that it was a proof of concept for what having static metadata on PyPI would gain us). Paul
On Wed, 1 Mar 2017 at 15:58 Pradyun Gedam
On Tue, Feb 28, 2017, 21:18 Jim Fulton
wrote: On Tue, Feb 28, 2017 at 10:14 AM, Pradyun Gedam
wrote: ... 4. (if time permits) Move any dependency resolution code out into a separate library.
This would make it possible for other projects (like buildout or a future pip replacement) to reuse the dependency resolver.
Thank you!
Welcome!
...
I do intend to reuse some of the work done by Robert Collins in PR #2716 on pip's GitHub repository.
Are you aware of the proof of concept in distlib?
I am. I had looked at it a few weeks back. IIRC it makes a dependency graph using distlib and operates with that.
I haven't really understood how it gets the information about dependencies without downloading the packages... I'll give it another pass this weekend.
I went through it. As Paul Moore said, it is hitting http://www.red-dove.com/pypi/ which has metdata on what the requirements are of a package. (saying this on the basis of [1]) Since PyPI does not have such information in a static declarative format, that approach is not feasible. pip will have to download packages and execute setup.py to know what the dependencies are. [1]: https://www.red-dove.com/pypi/projects/S/Sphinx/package-1.3.json
https://distil.readthedocs.io/en/0.1.0/overview.html#actual-improvements
Jim
-- Jim Fulton http://jimfulton.info
On Mar 4, 2017, at 12:25 PM, Pradyun Gedam
wrote: Since PyPI does not have such information in a static declarative format, that approach is not feasible. pip will have to download packages and execute setup.py to know what the dependencies are.
I will note, that we can expose that information in PyPI for *wheels*, but not for sdists currently. It would be a lot more work though because it’d essentially require a whole new repository API and I doubt Pradyun wants to tackle that right now :) Keeping a future in mind where we can get a least some of that information without downloading would be good though, at least to keep in mind when structuring code. — Donald Stufft
On Sat, 4 Mar 2017 at 22:58 Donald Stufft
On Mar 4, 2017, at 12:25 PM, Pradyun Gedam
wrote: Since PyPI does not have such information in a static declarative format, that approach is not feasible. pip will have to download packages and execute setup.py to know what the dependencies are.
I will note, that we can expose that information in PyPI for *wheels*, but not for sdists currently. It would be a lot more work though because it’d essentially require a whole new repository API and I doubt Pradyun wants to tackle that right now :)
Yeah... For now, it's just dependency resolution in pip.
Keeping a future in mind where we can get a least some of that information without downloading would be good though, at least to keep in mind when structuring code.
Duly noted.
—
Donald Stufft
Great news ! Your plan seems reasonable. The first stage (RequirementSet refactor) seems to me to be the trickiest. Anyway I'm looking forward for your PRs :) Xavier
On Wed, Mar 1, 2017 at 4:14 AM, Pradyun Gedam
Hello Everyone!
Google released the list of accepted organizations for GSoC 2017 and PSF is one of them.
I see pip is not yet listed as a PSF sub-org on http://python-gsoc.org/. This is pretty urgent to arrange: * "March 3* - Last day for Python sub-orgs to apply to participate with the PSF. (Assuming we get accepted by Google and can support sub-orgs, of course!) This deadline is for orgs who applies on their own and didn't make it, but still wish to participate under the umbrella. " The original deadline was Feb 7. There's a good chance that Pip will still be accepted after March 3, but I wouldn't gamble on it. There are instructions under "Project Ideas" on http://python-gsoc.org/ on how to get accepted as a sub-org. Cheers, Ralf
On Mar 1, 2017, at 3:02 PM, Ralf Gommers
wrote: On Wed, Mar 1, 2017 at 4:14 AM, Pradyun Gedam
mailto:pradyunsg@gmail.com> wrote: Hello Everyone! Google released the list of accepted organizations for GSoC 2017 and PSF is one of them.
I see pip is not yet listed as a PSF sub-org on http://python-gsoc.org/ http://python-gsoc.org/. This is pretty urgent to arrange:
"March 3 - Last day for Python sub-orgs to apply to participate with the PSF. (Assuming we get accepted by Google and can support sub-orgs, of course!) This deadline is for orgs who applies on their own and didn't make it, but still wish to participate under the umbrella. "
The original deadline was Feb 7. There's a good chance that Pip will still be accepted after March 3, but I wouldn't gamble on it.
There are instructions under "Project Ideas" on http://python-gsoc.org/ http://python-gsoc.org/ on how to get accepted as a sub-org.
Oh. I’ve never done this before and Pradyun reached out so I had no idea I had to do this. I’ll go ahead and do this. — Donald Stufft
Thanks for the pointer Ralf! :)
I was actually drafting a mail to send to Donald directly for thanking him
for being willing to mentor me as well as pointing this out to him.
I guess I can discard that draft now...
On Thu, Mar 2, 2017, 01:37 Donald Stufft
On Mar 1, 2017, at 3:02 PM, Ralf Gommers
wrote: On Wed, Mar 1, 2017 at 4:14 AM, Pradyun Gedam
wrote: Hello Everyone!
Google released the list of accepted organizations for GSoC 2017 and PSF is one of them.
I see pip is not yet listed as a PSF sub-org on http://python-gsoc.org/. This is pretty urgent to arrange:
* "March 3* - Last day for Python sub-orgs to apply to participate with the PSF. (Assuming we get accepted by Google and can support sub-orgs, of course!) This deadline is for orgs who applies on their own and didn't make it, but still wish to participate under the umbrella. "
The original deadline was Feb 7. There's a good chance that Pip will still be accepted after March 3, but I wouldn't gamble on it.
There are instructions under "Project Ideas" on http://python-gsoc.org/ on how to get accepted as a sub-org.
Oh. I’ve never done this before and Pradyun reached out so I had no idea I had to do this. I’ll go ahead and do this.
—
Donald Stufft
On Thu, Mar 2, 2017 at 9:07 AM, Donald Stufft
On Mar 1, 2017, at 3:02 PM, Ralf Gommers
wrote: On Wed, Mar 1, 2017 at 4:14 AM, Pradyun Gedam
wrote: Hello Everyone!
Google released the list of accepted organizations for GSoC 2017 and PSF is one of them.
I see pip is not yet listed as a PSF sub-org on http://python-gsoc.org/. This is pretty urgent to arrange:
* "March 3* - Last day for Python sub-orgs to apply to participate with the PSF. (Assuming we get accepted by Google and can support sub-orgs, of course!) This deadline is for orgs who applies on their own and didn't make it, but still wish to participate under the umbrella. "
The original deadline was Feb 7. There's a good chance that Pip will still be accepted after March 3, but I wouldn't gamble on it.
There are instructions under "Project Ideas" on http://python-gsoc.org/ on how to get accepted as a sub-org.
Oh. I’ve never done this before and Pradyun reached out so I had no idea I had to do this. I’ll go ahead and do this.
I'm the GSoC admin for SciPy, so need to keep track of the various deadlines/todos. I'd be happy to ping you each time one approaches if that helps. There's a PSF GSoC mentors list that's not noisy and useful to join. You'll be added to the Google GSoC-mentors list automatically if you start mentoring in the program, but you may want to mute it or not use your primary email address for it (it's high-traffic, very low signal to noise and you can't unsubscribe). Ralf
Ok, so It appears besides me we need another one or two mentors to act as backup mentors. I guess in the event I’m not available or so. Probably ideally the backup mentor would either be familiar with pip’s codebase or else familiar with the ideas behind a backtracking resolver. I do have someone who can do it if needed, but I figured I’d poke distutils-sig first to see if anyone else wanted to do it as well. They suggest that at least one mentor be exclusive to the student but that the other mentors can work with multiple students. For pip we only have the one (yay Pradyun) and I’m not mentoring anyone else so we should be good on the exclusive front (of course, if someone is interested to help with this, they can also be exclusive).
On Mar 1, 2017, at 4:31 PM, Ralf Gommers
wrote: I'm the GSoC admin for SciPy, so need to keep track of the various deadlines/todos. I'd be happy to ping you each time one approaches if that helps.
That would be awesome. I’m poking at the sites now to figure out everything I need to do to make sure all the administration bits are done properly, but having a double check that I don’t miss something would be great.
There's a PSF GSoC mentors list that's not noisy and useful to join. You'll be added to the Google GSoC-mentors list automatically if you start mentoring in the program, but you may want to mute it or not use your primary email address for it (it's high-traffic, very low signal to noise and you can't unsubscribe).
Ok cool. — Donald Stufft
I'd be happy to help to provide mentorship for the backtracking dependency
resolver aspect. I don't know pip's code well though.
Thanks,
Justin
On Thu, Mar 2, 2017 at 11:12 AM, Donald Stufft
Ok, so It appears besides me we need another one or two mentors to act as backup mentors. I guess in the event I’m not available or so. Probably ideally the backup mentor would either be familiar with pip’s codebase or else familiar with the ideas behind a backtracking resolver. I do have someone who can do it if needed, but I figured I’d poke distutils-sig first to see if anyone else wanted to do it as well.
They suggest that at least one mentor be exclusive to the student but that the other mentors can work with multiple students. For pip we only have the one (yay Pradyun) and I’m not mentoring anyone else so we should be good on the exclusive front (of course, if someone is interested to help with this, they can also be exclusive).
On Mar 1, 2017, at 4:31 PM, Ralf Gommers
wrote: I'm the GSoC admin for SciPy, so need to keep track of the various deadlines/todos. I'd be happy to ping you each time one approaches if that helps.
That would be awesome. I’m poking at the sites now to figure out everything I need to do to make sure all the administration bits are done properly, but having a double check that I don’t miss something would be great.
There's a PSF GSoC mentors list that's not noisy and useful to join. You'll be added to the Google GSoC-mentors list automatically if you start mentoring in the program, but you may want to mute it or not use your primary email address for it (it's high-traffic, very low signal to noise and you can't unsubscribe).
Ok cool.
— Donald Stufft
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Mar 2, 2017, at 11:31 AM, Justin Cappos
wrote: I'd be happy to help to provide mentorship for the backtracking dependency resolver aspect. I don't know pip's code well though.
Awesome, that would work out well actually I think, because while I know pip’s code base, the actual resolver bits are not my strong suite (one of the main reasons I hadn’t done this work already is the research to actually figure out the right resolver tech and how it functions). — Donald Stufft
participants (7)
-
Donald Stufft
-
Jim Fulton
-
Justin Cappos
-
Paul Moore
-
Pradyun Gedam
-
Ralf Gommers
-
Xavier Fernandez