What would it take to split the stdlib out into its own git repo?
What would be involved in making the stdlib its own repo, separate from CPython itself? Now I'm not suggesting making sure it fully functions on its own, but more of what would need to happen if we decided that the stdlib should be its own git repo so that any Python implementation -- including CPython -- would include the stdlib as e.g. a git submodule? For instance, would we be able to split the history, or would the original history stay in the CPython repo and we would start from scratch in the stdlib repo and `git log` would hopefully be smart enough to merge the two histories? How bad is it to work in a repo with a submodule where you will be making changes to submodules regularly? And are there benefits? My hope/hunch is that if we make the stdlib its own repo then other implementations could include the stdlib as a submodule or something, making it easier for them to not only keep up-to-date with fixes to the stdlib, but also make it easier for them to push changes upstream that everyone would benefit from instead of having any changes silo-ed off in their own repo. Am I nuts, or is this something reasonable to consider doing as part of the GitHub migration?
On Sun, Jul 17, 2016 at 10:46 AM, Brett Cannon <brett@python.org> wrote:
For instance, would we be able to split the history, or would the original history stay in the CPython repo and we would start from scratch in the stdlib repo and `git log` would hopefully be smart enough to merge the two histories? How bad is it to work in a repo with a submodule where you will be making changes to submodules regularly?
I detest working with git submodules; if the repositories get split, I'd much rather have ./python look for ../python-stdlib as a parallel repo. They stand entirely separately; you simply clone both repos into the same directory. (For example, the editor SciTE and its component Scintilla work this way. I have /home/rosuav/scintilla and /home/rosuav/scite, and build Scintilla first, then build SciTE. The building part wouldn't be an issue with the stdlib, so it'd be easier.) Splitting out the history can certainly be done. You simply clone the main CPython git repo, then tell git to throw away everything that isn't in the Lib/ directory. Not all commits will read sensibly like that (maybe there's a trivial edit to the stdlib, associated with a core interpreter edit, and the commit message mentions only the core), but it's faithful and reliable, and you get the full history, going back deep into the Mercurial days. (And earlier, if the hg repo imported other data.) ChrisA
On Sun, Jul 17, 2016 at 10:54 AM, Chris Angelico <rosuav@gmail.com> wrote:
if the repositories get split, I'd much rather have ./python look for ../python-stdlib as a parallel repo. They stand entirely separately; you simply clone both repos into the same directory.
Oh, and if you tell people to do it this way, you can "ln -s ../python-stdlib Lib" and commit that symlink into the repo. Existing code needn't even be changed. When you clone a repo that has submodules, you have to know to run another command to update the submodules. Likewise when you pull changes. Same diff as having two stand-alone repos. Splitting the stdlib is going to make things a little harder no matter how it's done, so it wants to be worthwhile. ChrisA
On Sun, Jul 17, 2016 at 10:59:57AM +1000, Chris Angelico <rosuav@gmail.com> wrote:
Oh, and if you tell people to do it this way, you can "ln -s ../python-stdlib Lib" and commit that symlink into the repo.
Doesn't work on w32 AFAIK. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
More importantly, you'd get into lots of situations where the heads of the two trees don't work together. And separate versioning is just not realistic for the stdlib. On Saturday, July 16, 2016, Oleg Broytman <phd@phdru.name> wrote:
On Sun, Jul 17, 2016 at 10:59:57AM +1000, Chris Angelico <rosuav@gmail.com <javascript:;>> wrote:
Oh, and if you tell people to do it this way, you can "ln -s ../python-stdlib Lib" and commit that symlink into the repo.
Doesn't work on w32 AFAIK.
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name <javascript:;> Programmers don't die, they just GOSUB without RETURN. _______________________________________________ core-workflow mailing list core-workflow@python.org <javascript:;> https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
-- --Guido (mobile)
On Jul 16, 2016 7:54 PM, "Chris Angelico" <rosuav@gmail.com> wrote:
On Sun, Jul 17, 2016 at 10:46 AM, Brett Cannon <brett@python.org> wrote:
For instance, would we be able to split the history, or would the
original
history stay in the CPython repo and we would start from scratch in the stdlib repo and `git log` would hopefully be smart enough to merge the two histories? How bad is it to work in a repo with a submodule where you will be making changes to submodules regularly?
I detest working with git submodules; if the repositories get split, I'd much rather have ./python look for ../python-stdlib as a parallel repo. They stand entirely separately; you simply clone both repos into the same directory. (For example, the editor SciTE and its component Scintilla work this way. I have /home/rosuav/scintilla and /home/rosuav/scite, and build Scintilla first, then build SciTE. The building part wouldn't be an issue with the stdlib, so it'd be easier.)
What about subtrees: https://medium.com/@porteneuve/mastering-git-subtrees-943d29a798ec#.6bbjxspc...
Splitting out the history can certainly be done. You simply clone the main CPython git repo, then tell git to throw away everything that isn't in the Lib/ directory. Not all commits will read sensibly like that (maybe there's a trivial edit to the stdlib, associated with a core interpreter edit, and the commit message mentions only the core), but it's faithful and reliable, and you get the full history, going back deep into the Mercurial days. (And earlier, if the hg repo imported other data.)
ChrisA _______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct:
https://www.python.org/psf/codeofconduct -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/
On Sun, Jul 17, 2016 at 10:59 AM, Ryan Gonzalez <rymg19@gmail.com> wrote:
I detest working with git submodules; if the repositories get split, I'd much rather have ./python look for ../python-stdlib as a parallel repo. They stand entirely separately; you simply clone both repos into the same directory. (For example, the editor SciTE and its component Scintilla work this way. I have /home/rosuav/scintilla and /home/rosuav/scite, and build Scintilla first, then build SciTE. The building part wouldn't be an issue with the stdlib, so it'd be easier.)
What about subtrees:
https://medium.com/@porteneuve/mastering-git-subtrees-943d29a798ec#.6bbjxspc...
Never used them; from a look at the article, that would include the whole stdlib history still in the main repo, right? I'm not sure how well this would scale to lots of people with lots of repos, or how you'd clone appropriately. There would be duplicate commits (one in stdlib, one in main) with different hashes, or else a really messy graph of merges. Not sure this is an improvement. ChrisA
On Jul 16, 2016, at 8:46 PM, Brett Cannon <brett@python.org> wrote:
What would be involved in making the stdlib its own repo, separate from CPython itself? Now I'm not suggesting making sure it fully functions on its own, but more of what would need to happen if we decided that the stdlib should be its own git repo so that any Python implementation -- including CPython -- would include the stdlib as e.g. a git submodule? For instance, would we be able to split the history, or would the original history stay in the CPython repo and we would start from scratch in the stdlib repo and `git log` would hopefully be smart enough to merge the two histories? How bad is it to work in a repo with a submodule where you will be making changes to submodules regularly?
It’s kind of miserable in my experience TBH.
And are there benefits? My hope/hunch is that if we make the stdlib its own repo then other implementations could include the stdlib as a submodule or something, making it easier for them to not only keep up-to-date with fixes to the stdlib, but also make it easier for them to push changes upstream that everyone would benefit from instead of having any changes silo-ed off in their own repo.
The C-extensions part might be hard for this, since that’s an implementation detail of CPython. This is probably best asked of PyPy, Python, etc though. One thing though, you don’t need to split out into a separate repo to allow people to do this, they can do git sub-tree merges from CPython into their own git repos (https://jrsmith3.github.io/merging-a-subdirectory-from-another-repo-via-git-... <https://jrsmith3.github.io/merging-a-subdirectory-from-another-repo-via-git-subtree.html>).
Am I nuts, or is this something reasonable to consider doing as part of the GitHub migration?
I’m not sure I see a whole lot of value here unless we make it possible to release the stdlib separately tbh.
_______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
— Donald Stufft
On Jul 16, 2016, at 20:46, Brett Cannon <brett@python.org> wrote:
And are there benefits? My hope/hunch is that if we make the stdlib its own repo then other implementations could include the stdlib as a submodule or something, making it easier for them to not only keep up-to-date with fixes to the stdlib, but also make it easier for them to push changes upstream that everyone would benefit from instead of having any changes silo-ed off in their own repo.
Without giving it a lot of thought, this strikes me as an unnecessary complication. It will undoubtedly make CPython development more complicated, both initially in separating the stdlib out and, in the long run, figuring out how to keep them in sync all the time. Other Python implementations already have to deal with this, if they want to, and they can very easily ignore the CPython-specific parts of the repo without having to add all this additional work on everyone. Let's not make this migration harder than it has to be. We've got enough to do already. -- Ned Deily nad@python.org -- []
I think Ned strikes the nail on the head. It's a fair burden to keep the repos in sync. I propose that instead we have some script that can make a release of just the stdlib for the benefits of other Python implementations. FWIW I've got a fair bit of experience using a subrepo for a similar purpose, using the proposed separation: mypy depends on typeshed and includes it as a subrepo; but there are other users of typeshed as well (Google's pytype, as well as PyCharm). I think it's nice that the mypy history and the typeshed history are separate. Keeping typeshed up to date when you switch mypy branches is easy enough using a git hook. The two places where it breaks down: First, there's an endless series of "sync typeshed" commits to the mypy repo. We do these frequently because we have users of mypy (e.g. Dropbox) who sync with the development head of mypy regularly --much more often than mypy releases go out-- and who often want improvements to typeshed as well as improvements to mypy. When using mypy we don't point it to a separate copy of typeshed --it would too often be out of sync-- instead we install from mypy's master branch which also syncs to a recent version of typeshed. Second, tests. Mypy's tests depend on typeshed, and typeshed's tests depend on mypy. It's a never-ending nightmare (or rather, death by a thousand cuts). In the mypy world we live with this because we really want to encourage shared ownership of typeshed (and Google's pytype team reviews and merges a significant number of typeshed PRs and sometimes brings up concerns when we forget that typeshed doesn't exist just to support mypy). But for Python I think the situation is just asymmetric, and separating out the stdlib isn't going to suddenly level the playing field. So the burden for the core Python devs is not warranted, IMO. --Guido
On 17 July 2016 at 11:27, Ned Deily <nad@python.org> wrote:
On Jul 16, 2016, at 20:46, Brett Cannon <brett@python.org> wrote:
And are there benefits? My hope/hunch is that if we make the stdlib its own repo then other implementations could include the stdlib as a submodule or something, making it easier for them to not only keep up-to-date with fixes to the stdlib, but also make it easier for them to push changes upstream that everyone would benefit from instead of having any changes silo-ed off in their own repo.
Without giving it a lot of thought, this strikes me as an unnecessary complication. It will undoubtedly make CPython development more complicated, both initially in separating the stdlib out and, in the long run, figuring out how to keep them in sync all the time. Other Python implementations already have to deal with this, if they want to, and they can very easily ignore the CPython-specific parts of the repo without having to add all this additional work on everyone. Let's not make this migration harder than it has to be. We've got enough to do already.
+1. Aside from the practical problems with the git separation itself, we also have the problem of getpath.c already being a tremendously complicated piece of imperative logic (combining both compile time and runtime checks, with an entirely separate implementation for Windows), which would need updating to cope with separation of the standard library into "the cross-implementation bits" and "the CPython bits". That said, I'd actually be in favour of doing such a structural split *within* the repo (similar to the way we separated out the Programs directory a while back), but even that would be quite a lot of tedious work for minimal real gain. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (8)
-
Brett Cannon
-
Chris Angelico
-
Donald Stufft
-
Guido van Rossum
-
Ned Deily
-
Nick Coghlan
-
Oleg Broytman
-
Ryan Gonzalez