
Hi All,
I''d like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that. At the moment it seems to me that the changes can be broken up into three categories:
1) Movement of files and resulting changes to the build process. 2) Refactoring of the files for CPython. 3) Addition of an IronPython interface.
I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository.
I'm not intimately familiar with details of the changes that have been made in the refactor, so I welcome any thoughts by those folks involved in the work. And of course by the usual numpy people who will need to adjust to the changes.
Chuck

On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote:
I'd like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that.
What is a "flag day"?
At the moment it seems to me that the changes can be broken up into three categories:
- Movement of files and resulting changes to the build process.
- Refactoring of the files for CPython.
- Addition of an IronPython interface.
I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository.
The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP.

On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen pav@iki.fi wrote:
On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote:
I'd like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that.
What is a "flag day"?
It all goes in as one big commit.
At the moment it seems to me that the changes can be broken up into three categories:
- Movement of files and resulting changes to the build process.
- Refactoring of the files for CPython.
- Addition of an IronPython interface.
I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository.
The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP.
I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project http://esr.ibiblio.org/?p=2727 of interest.
<snip>
Chuck

Thanks for starting the discussion, Charles.
Merging of the re-factor is a priority for me once I get back from last 9 weeks of travel I have been on (I have been travelling for business 7 of the last 9 weeks).
Ilan Schnell has already been looking at how to accomplish the merge (and I have been reading up on Git so that I understand the commit model better and can possibly help without being a complete neophyte with git). Pauli's script will be very helpful.
I'm very enthused about the bug-fixes, memory-leak closures, and new tests that have been added on the re-factor branch. I'm also interested in getting more community feedback on the ndarray library C-API, and the other changes that have been made. This feedback will be essential before the changes can become NumPy 2.0. I would also like to see a few more NEPs become part of NumPy 2.0 over the next several months. I have a long wish list of additional NEPS that I've only sketched in quick drafts at this point as well --- datetime finishes, geometry-information, i.e. dimension and index labels, reduce-by implementation, indirect arrays, and generator array objects.
My initial guess as to how quickly we would have a NumPy 2.0 was ambitious partly because I have had almost zero time personally to work on it, and partly because we have been resource constrained which has pushed us to draw out the project a bit. But, I've come up with a long list of new features for NumPy 2.0 that I would like to hash out on the mailing lists over the next months as well. My hope is for NumPy 2.0 to come out by the end of Q1 sometime next year. My hopes may have to be tempered by limited time resources, of course.
At the same time, the work on the .NET framework has pushed us to move more of SciPy to a Cython-generated set. There are additional things I would like to see SciPy improve on as well, but I am not sure who is going to work on them. If I had my dream, there would be more modularity to the packages, and an improved packaging system --- and of course, porting to Python 3k. I would like to see core SciPy be a smaller set containing a few core packages. (linear algebra, statistics, optimization, interpolation, signal processing, and image processing). Then, I would like to see scipy.<module> packages which are released and packaged separately with the whole system available on github.
The past couple of years have been very busy for me (and continue to be busy), but I am hoping that next year will allow me more time to spend on promoting sprints, and participating more in the community. I will not have the time I used to have when I was a full-time academic, but I plan to be more involved in helping promote SciPy development. With SciPy moved over to github, I think that will even be possible without my stepping on everybody else's hard work.
-Travis
I have forwarded th On Nov 11, 2010, at 9:30 PM, Charles R Harris wrote:
On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen pav@iki.fi wrote: On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote:
I'd like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that.
What is a "flag day"?
It all goes in as one big commit.
At the moment it seems to me that the changes can be broken up into three categories:
- Movement of files and resulting changes to the build process.
- Refactoring of the files for CPython.
- Addition of an IronPython interface.
I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository.
The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP.
I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project of interest.
<snip>
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
--- Travis Oliphant Enthought, Inc. oliphant@enthought.com 1-512-536-1057 http://www.enthought.com

On Thu, Nov 11, 2010 at 3:15 PM, Travis Oliphant oliphant@enthought.comwrote:
Thanks for starting the discussion, Charles.
Merging of the re-factor is a priority for me once I get back from last 9 weeks of travel I have been on (I have been travelling for business 7 of the last 9 weeks).
Ilan Schnell has already been looking at how to accomplish the merge (and I have been reading up on Git so that I understand the commit model better and can possibly help without being a complete neophyte with git). Pauli's script will be very helpful.
I'm very enthused about the bug-fixes, memory-leak closures, and new tests that have been added on the re-factor branch. I'm also interested in getting more community feedback on the ndarray library C-API, and the other changes that have been made. This feedback will be essential before the changes can become NumPy 2.0. I would also like to see a few more NEPs become part of NumPy 2.0 over the next several months. I have a long wish list of additional NEPS that I've only sketched in quick drafts at this point as well --- datetime finishes, geometry-information, i.e. dimension and index labels, reduce-by implementation, indirect arrays, and generator array objects.
Let's not go overboard here. I think it would be a good idea to keep the numpy core as unencumbered as possible. Adding things that let other people build stuff is great, but putting too much at the core will likely make maintenance more difficult and the barrier to entry higher. IMHO, the core of numpy functionality is access to strided memory, topped with ufuncs. Linear algebra, random numbers, etc are add-ons, but useful ones to combine with the core package. I think that index labels are already pushing the limits.
What do you want to do with datetime? We could remove it from the current trunk and leave it to come in with the refactoring when it is ready.
My initial guess as to how quickly we would have a NumPy 2.0 was ambitious partly because I have had almost zero time personally to work on it, and partly because we have been resource constrained which has pushed us to draw out the project a bit. But, I've come up with a long list of new features for NumPy 2.0 that I would like to hash out on the mailing lists over the next months as well. My hope is for NumPy 2.0 to come out by the end of Q1 sometime next year. My hopes may have to be tempered by limited time resources, of course.
The rule of thumb is to multiply software time estimates by four. The multiplication needs to be done by someone uninvolved because programmers usually think they have already accounted for the unexpected time requirements.
At the same time, the work on the .NET framework has pushed us to move more of SciPy to a Cython-generated set. There are additional things I would like to see SciPy improve on as well, but I am not sure who is going to work on them. If I had my dream, there would be more modularity to the packages, and an improved packaging system --- and of course, porting to Python 3k. I would like to see core SciPy be a smaller set containing a few core packages. (linear algebra, statistics, optimization, interpolation, signal processing, and image processing). Then, I would like to see scipy.<module> packages which are released and packaged separately with the whole system available on github.
The past couple of years have been very busy for me (and continue to be
busy), but I am hoping that next year will allow me more time to spend on promoting sprints, and participating more in the community. I will not have the time I used to have when I was a full-time academic, but I plan to be more involved in helping promote SciPy development. With SciPy moved over to github, I think that will even be possible without my stepping on everybody else's hard work.
Oh, the guys in the corner offices always say that. Somehow it doesn't work out that way, someone has to keep the business going. The best way to keep working at the bench, so to speak, is to avoid promotion in the first place. I'm afraid it may be too late for you ;)
Chuck

On Fri, Nov 12, 2010 at 7:15 AM, Travis Oliphant oliphant@enthought.com wrote:
At the same time, the work on the .NET framework has pushed us to move more of SciPy to a Cython-generated set. There are additional things I would like to see SciPy improve on as well, but I am not sure who is going to work on them. If I had my dream, there would be more modularity to the packages, and an improved packaging system --- and of course, porting to Python 3k.
I don't exactly where we are there, but Pauli and me took a look at scipy for python 3 at euroscipy in Paris, and I think it is mostly a matter of low hanging fruits. Most (all ?) changes are in the trunk already.
I would like to see core SciPy be a smaller set containing a few core packages. (linear algebra, statistics, optimization, interpolation, signal processing, and image processing). Then, I would like to see scipy.<module> packages which are released and packaged separately with the whole system available on github.
While I agree with the sentiment, I think it would be a mistake to do so before we have the infrastructure to actually deliver packages and so on. I understand there is a bit of a chicken and egg issue as well. I spent most if not all my free time in 2010 to work on that issue, and I will summarize the current status in a separate email to the ML to avoid disrupting the main discussion on the refactoring,
cheers,
David

Fri, 12 Nov 2010 12:02:55 +0900, David Cournapeau wrote: [clip: Python 3 on Scipy]
I don't exactly where we are there, but Pauli and me took a look at scipy for python 3 at euroscipy in Paris, and I think it is mostly a matter of low hanging fruits. Most (all ?) changes are in the trunk already.
Only scipy.weave is left to do. Otherwise, the test suite passes on Python 3.

On 11/11/2010 09:02 PM, David Cournapeau wrote:
On Fri, Nov 12, 2010 at 7:15 AM, Travis Oliphantoliphant@enthought.com wrote:
At the same time, the work on the .NET framework has pushed us to move more of SciPy to a Cython-generated set. There are additional things I would like to see SciPy improve on as well, but I am not sure who is going to work on them. If I had my dream, there would be more modularity to the packages, and an improved packaging system --- and of course, porting to Python 3k.
I don't exactly where we are there, but Pauli and me took a look at scipy for python 3 at euroscipy in Paris, and I think it is mostly a matter of low hanging fruits. Most (all ?) changes are in the trunk already.
I would like to see core SciPy be a smaller set containing a few core packages. (linear algebra, statistics, optimization, interpolation, signal processing, and image processing). Then, I would like to see scipy.<module> packages which are released and packaged separately with the whole system available on github.
While I agree with the sentiment, I think it would be a mistake to do so before we have the infrastructure to actually deliver packages and so on. I understand there is a bit of a chicken and egg issue as well. I spent most if not all my free time in 2010 to work on that issue, and I will summarize the current status in a separate email to the ML to avoid disrupting the main discussion on the refactoring,
cheers,
David
I agree with David comment because splitting requires effective package management to handle all these splits. Also there seems to be little point in splitting if users tend to require more than just the core. The problem with splitting things too finely is that these create more problems that it is worth. We have already experienced incompatibility problems in numarray's short history with at least masked arrays and the datetime addition.
Related to this, can the refactoring be used to make future developments of numpy and scipy especially in terms of packaging easier? I can see that moving or renaming of directories and files to more convenient places or names could be easily done at this time.
Bruce

On 11/11/2010 11:15 PM, Travis Oliphant wrote:
Thanks for starting the discussion, Charles.
Merging of the re-factor is a priority for me once I get back from last 9 weeks of travel I have been on (I have been travelling for business 7 of the last 9 weeks).
Ilan Schnell has already been looking at how to accomplish the merge (and I have been reading up on Git so that I understand the commit model better and can possibly help without being a complete neophyte with git). Pauli's script will be very helpful.
I'm very enthused about the bug-fixes, memory-leak closures, and new tests that have been added on the re-factor branch. I'm also interested in getting more community feedback on the ndarray library C-API, and the other changes that have been made. This feedback will be essential before the changes can become NumPy 2.0. I would also like to see a few more NEPs become part of NumPy 2.0 over the next several months. I have a long wish list of additional NEPS that I've only sketched in quick drafts at this point as well --- datetime finishes, geometry-information, i.e. dimension and index labels, reduce-by implementation, indirect arrays, and generator array objects.
My initial guess as to how quickly we would have a NumPy 2.0 was ambitious partly because I have had almost zero time personally to work on it, and partly because we have been resource constrained which has pushed us to draw out the project a bit. But, I've come up with a long list of new features for NumPy 2.0 that I would like to hash out on the mailing lists over the next months as well. My hope is for NumPy 2.0 to come out by the end of Q1 sometime next year. My hopes may have to be tempered by limited time resources, of course.
Conventionally, 2.0 would be the preferred point to break backwards compatability (and changes that could break stability), while simply adding new backwards compatible features can just as well be done in 2.1.
IMO the crucial question is: Would it be possible to split this long list you have in mind in this fashion? And how much remains that will break backwards compatibility or cause instability?
Dag Sverre
At the same time, the work on the .NET framework has pushed us to move more of SciPy to a Cython-generated set. There are additional things I would like to see SciPy improve on as well, but I am not sure who is going to work on them. If I had my dream, there would be more modularity to the packages, and an improved packaging system --- and of course, porting to Python 3k. I would like to see core SciPy be a smaller set containing a few core packages. (linear algebra, statistics, optimization, interpolation, signal processing, and image processing). Then, I would like to see scipy.<module> packages which are released and packaged separately with the whole system available on github.
The past couple of years have been very busy for me (and continue to be busy), but I am hoping that next year will allow me more time to spend on promoting sprints, and participating more in the community. I will not have the time I used to have when I was a full-time academic, but I plan to be more involved in helping promote SciPy development. With SciPy moved over to github, I think that will even be possible without my stepping on everybody else's hard work.
-Travis
I have forwarded th On Nov 11, 2010, at 9:30 PM, Charles R Harris wrote:
On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen <pav@iki.fi mailto:pav@iki.fi> wrote:
On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote: > I'd like to open a discussion about the steps to be followed in merging > the numpy refactor. I have two concerns about this. First, the refactor > repository branched off some time ago and I'm concerned about code > divergence, not just in the refactoring, but in fixes going into the > master branch on github. Second, it is likely that a flag day will look > like the easiest solution and I think we should avoid that. What is a "flag day"?
It all goes in as one big commit.
> At the moment it seems to me that the changes can be broken up into > three categories: > > 1) Movement of files and resulting changes to the build process. > 2) Refactoring of the files for CPython. > 3) Addition of an IronPython interface. > > I'd like to see 1) go into the master branch as soon as possible, > followed by 2) so that the changes can be tested and fixes will go into > a common repository. The main github repository can then be branched for > adding the IronPython stuff. In short, I think it would be usefull to > abandon the teoliphant fork at some point and let the work continue in a > fork of the numpy repository. The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP.
I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project http://esr.ibiblio.org/?p=2727 of interest.
<snip>
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Travis Oliphant Enthought, Inc. oliphant@enthought.com mailto:oliphant@enthought.com 1-512-536-1057 http://www.enthought.com
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Chuck, Pauli,
This is indeed a good time to bring this up as we are in the process fixing Python 3 issues and then merging changes from the master tree in preparation for being able to consider merging the work. More specific comments inline below.
Regards, Jason
On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen pav@iki.fi wrote:
On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote:
I'd like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that.
What is a "flag day"?
It all goes in as one big commit.
At the moment it seems to me that the changes can be broken up into three categories:
- Movement of files and resulting changes to the build process.
- Refactoring of the files for CPython.
- Addition of an IronPython interface.
1) and 2) are really the same step as we haven't moved/renamed existing files but instead moved content from the CPython interface files into new, platform-independent files. Specifically, there is a new top-level directory 'libndarray' that contains the platform-independent core. The existing CPython interface files remain in place, but much of the functionality is now implemented by calling into this core.
Unfortunately this makes merging difficult because some changes need to be manually applied to a different file. Once all regression tests are passing on the refactor branch for both Python 2.x and 3.x (3.x is in progress) Ilan is going to start working on applying all accumulated changes. The good news is that 95% of our changes are to core/multiarray and core/umath and there are relatively few changes to these modules in the master repository.
The IronPython interface lives in its own directory and is quite standalone. It just links to the .so from libndarray and just has a Visual Studio solution -- it is not part of the main build for now to avoid breaking all of the people who don't care about it.
I'd like to see 1) go into the master branch as soon as possible,
followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository.
The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP.
I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project http://esr.ibiblio.org/?p=2727 of interest.
We will take a look at this and the script. There is also a feature in git that allows two trees to be grafted together so the refactoring will end up as a branch on the main repository with all edits. My hope is that we can roll all of our changes into the main repository as a branch and then selectively merge to the main branch as desired. For example, as you said, the IronPython changes don't need to be merged immediate.
Either way, I fully agree that we want to abandon our fork as soon as possible. If anything, it will go along way towards easing the merge and getting more eyeballs on the changes we have made so far.
On Thu, Nov 11, 2010 at 3:30 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Thu, Nov 11, 2010 at 2:08 PM, Pauli Virtanen pav@iki.fi wrote:
On Thu, 11 Nov 2010 12:38:53 -0700, Charles R Harris wrote:
I'd like to open a discussion about the steps to be followed in merging the numpy refactor. I have two concerns about this. First, the refactor repository branched off some time ago and I'm concerned about code divergence, not just in the refactoring, but in fixes going into the master branch on github. Second, it is likely that a flag day will look like the easiest solution and I think we should avoid that.
What is a "flag day"?
It all goes in as one big commit.
At the moment it seems to me that the changes can be broken up into three categories:
- Movement of files and resulting changes to the build process.
- Refactoring of the files for CPython.
- Addition of an IronPython interface.
I'd like to see 1) go into the master branch as soon as possible, followed by 2) so that the changes can be tested and fixes will go into a common repository. The main github repository can then be branched for adding the IronPython stuff. In short, I think it would be usefull to abandon the teoliphant fork at some point and let the work continue in a fork of the numpy repository.
The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP.
I agree that would be an excellent start. Speaking of repo surgery, you might find esr's latest project http://esr.ibiblio.org/?p=2727 of interest.
<snip>
Chuck
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, 11 Nov 2010 16:37:53 -0600, Jason McCampbell wrote: [clip]
We will take a look at this and the script. There is also a feature in git that allows two trees to be grafted together so the refactoring will end up as a branch on the main repository with all edits.
Yes, this is pretty much what the script does -- it detaches the commits in the refactor branch from the Git-SVN history, and reattaches them to the new Git history. This changes only the DAG of the commits, and not the tree and file contents corresponding to each commit.
(Git's graft feature can only add new parents, so filter-branch is needed.)
My hope is that we can roll all of our changes into the main repository as a branch and then selectively merge to the main branch as desired. For example, as you said, the IronPython changes don't need to be merged immediate.
I'm not sure if we should put development branches at all in the main repository.
A repository like
github.com/numpy/numpy-refactor
might be a better solution, and also give visibility.

On Thu, Nov 11, 2010 at 5:48 PM, Pauli Virtanen pav@iki.fi wrote:
On Thu, 11 Nov 2010 16:37:53 -0600, Jason McCampbell wrote: [clip]
We will take a look at this and the script. There is also a feature in git that allows two trees to be grafted together so the refactoring will end up as a branch on the main repository with all edits.
Yes, this is pretty much what the script does -- it detaches the commits in the refactor branch from the Git-SVN history, and reattaches them to the new Git history. This changes only the DAG of the commits, and not the tree and file contents corresponding to each commit.
(Git's graft feature can only add new parents, so filter-branch is needed.)
My hope is that we can roll all of our changes into the main repository as a branch and then selectively merge to the main branch as desired. For example, as you said, the IronPython changes don't need to be merged immediate.
I'm not sure if we should put development branches at all in the main repository.
A repository like
github.com/numpy/numpy-refactor
might be a better solution, and also give visibility.
I think that is right. The problem in merging stuff back into numpy from there will be tracking what has been merged and hasn't and consolidating things up into logical chunks. I'm not sure what the best workflow for that process will be. As to the first bits to merge, I would suggest the tests.
Chuck

On Thu, Nov 11, 2010 at 5:48 PM, Pauli Virtanen pav@iki.fi wrote:
On Thu, 11 Nov 2010 16:37:53 -0600, Jason McCampbell wrote: [clip]
We will take a look at this and the script. There is also a feature in git that allows two trees to be grafted together so the refactoring will end up as a branch on the main repository with all edits.
Yes, this is pretty much what the script does -- it detaches the commits in the refactor branch from the Git-SVN history, and reattaches them to the new Git history. This changes only the DAG of the commits, and not the tree and file contents corresponding to each commit.
(Git's graft feature can only add new parents, so filter-branch is needed.)
My hope is that we can roll all of our changes into the main repository as a branch and then selectively merge to the main branch as desired. For example, as you said, the IronPython changes don't need to be merged immediate.
I'm not sure if we should put development branches at all in the main repository.
A repository like
github.com/numpy/numpy-refactor
might be a better solution, and also give visibility.
The teoliphant repository is usually quiet on the weekends. Would it be reasonable to make github.com/numpy/numpy-refactor this weekend and ask the refactor folks to start their work there next Monday?
Chuck

Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote: [clip]
The teoliphant repository is usually quiet on the weekends. Would it be reasonable to make github.com/numpy/numpy-refactor this weekend and ask the refactor folks to start their work there next Monday?
Sure:
https://github.com/numpy/numpy-refactor
I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it.

On Fri, Nov 12, 2010 at 10:56 AM, Pauli Virtanen pav@iki.fi wrote:
Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote: [clip]
The teoliphant repository is usually quiet on the weekends. Would it be reasonable to make github.com/numpy/numpy-refactor this weekend and ask the refactor folks to start their work there next Monday?
Sure:
https://github.com/numpy/numpy-refactor
I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it.
I think it's even easier than that. If someone creates an empty repository and adds me (user: jasonmccampbell) as a contributor I should be able to add it as a remote for my current repository and push it any time.
That said, it might make sense to wait a week as Ilan is working on the merge now. Our plan is to create a clone of the master repository and create a refactoring branch off the trunk. We can then graft on our current branch (which is not connected to the master trunk), do the merge, then push this new refactor branch. This keeps us from having a repo with both an old, un-rooted branch plus the new, correct refactor branch.
I'm open either way, just wanted to throw this out there.
Jason
-- Pauli Virtanen
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Nov 12, 2010 at 1:37 PM, Jason McCampbell <jmccampbell@enthought.com
wrote:
On Fri, Nov 12, 2010 at 10:56 AM, Pauli Virtanen pav@iki.fi wrote:
Fri, 12 Nov 2010 09:24:56 -0700, Charles R Harris wrote: [clip]
The teoliphant repository is usually quiet on the weekends. Would it be reasonable to make github.com/numpy/numpy-refactor this weekend and ask the refactor folks to start their work there next Monday?
Sure:
https://github.com/numpy/numpy-refactor
I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it.
I think it's even easier than that. If someone creates an empty repository and adds me (user: jasonmccampbell) as a contributor I should be able to add it as a remote for my current repository and push it any time.
Well, Pauli already has your stuff in the new repository. Why not just clone it and continue your work there?
That said, it might make sense to wait a week as Ilan is working on the merge now. Our plan is to create a clone of the master repository and create a refactoring branch off the trunk.
But that is already done. Although I don't think doing it again will be problem.
We can then graft on our current branch (which is not connected to the master trunk), do the merge, then push this new refactor branch. This keeps us from having a repo with both an old, un-rooted branch plus the new, correct refactor branch.
But it is already grafted. Unless you are thinking of making a branch in numpy/numpy, which might be a bad idea.
I'm open either way, just wanted to throw this out there.
Chuck

On Fri, 12 Nov 2010 14:37:19 -0600, Jason McCampbell wrote:
Sure:
https://github.com/numpy/numpy-refactor
I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it.
Ok, maybe to clarify:
- That repo is already created,
- It contains your refactoring work, grafted on the current Git history, so you can either start merging using it, or first re-do the graft if you want to do it yourselves,
- You (and also the rest of the team) have push permissions there.
Cheers, Pauli
PS.
You can verify that the contents of the trees are exactly what you had before the grafting:
$ git cat-file commit origin/refactor tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent fec0175e306016d0eff688f63912ecd30946dcbb parent 7383a3bbed494aa92be61faeac2054fb609a1ab1 author Ilan Schnell ischnell@enthought.com 1289517493 -0600 committer Ilan Schnell ischnell@enthought.com 1289517493 -0600 ...
$ git cat-file commit new-rebased tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent 5e24bd3a9c2bdbd3bb5e92b03997831f15c22e4b parent e7caa5d73912a04ade9b4a327f58788ab5d9d585 author Ilan Schnell ischnell@enthought.com 1289517493 -0600 committer Ilan Schnell ischnell@enthought.com 1289517493 -0600
The tree hashes coincide, which means that the state of the tree at the two commits is exactly identical.

Pauli,
Thanks a lot for doing this, it helps a lot. Ilan was on another project this morning so this helps get the merge process started faster. It looks like it is auto-merging changes from Travis's repository because several recent changes are moved over. I will double check, but we should be able to switch to using this repository now.
Thanks, Jason
On Fri, Nov 12, 2010 at 3:31 PM, Pauli Virtanen pav@iki.fi wrote:
On Fri, 12 Nov 2010 14:37:19 -0600, Jason McCampbell wrote:
Sure:
https://github.com/numpy/numpy-refactor
I can re-sync/scrap it later on if needed, depending on what the refactoring team wants to do with it.
Ok, maybe to clarify:
That repo is already created,
It contains your refactoring work, grafted on the current Git history,
so you can either start merging using it, or first re-do the graft if you want to do it yourselves,
- You (and also the rest of the team) have push permissions there.
Cheers, Pauli
PS.
You can verify that the contents of the trees are exactly what you had before the grafting:
$ git cat-file commit origin/refactor tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent fec0175e306016d0eff688f63912ecd30946dcbb parent 7383a3bbed494aa92be61faeac2054fb609a1ab1 author Ilan Schnell ischnell@enthought.com 1289517493 -0600 committer Ilan Schnell ischnell@enthought.com 1289517493 -0600 ...
$ git cat-file commit new-rebased tree 85170987b6d3582b7928d46eda98bdfb394e0ea7 parent 5e24bd3a9c2bdbd3bb5e92b03997831f15c22e4b parent e7caa5d73912a04ade9b4a327f58788ab5d9d585 author Ilan Schnell ischnell@enthought.com 1289517493 -0600 committer Ilan Schnell ischnell@enthought.com 1289517493 -0600
The tree hashes coincide, which means that the state of the tree at the two commits is exactly identical.
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, 11 Nov 2010 21:08:55 +0000, Pauli Virtanen wrote: [clip]
The first step I would like to see is to re-graft the teoliphant branch onto the current Git history -- currently, it's still based on Git-SVN. Re-grafting would make incremental merging and tracking easier. Luckily, this is easy to do thanks to Git's data model (I have a script for it), and I believe it could be useful to do it ASAP.
This needs to be added to the --parent-filter in the script, though:
s/-p b629b740c9fb4685c5fd3d822efec8250d556ad4/-p 9ea50db4b5ca3a26c05cf4df364fa40f873da545/;
so that it attaches the dangling datetime commits to the correct place.
After this, "git rebase" of the changes in master onto the refactor branch seems to proceed reasonably. Based on a quick try, I got to 24 of 214 commits in ~ two hours, so I'd guess it'd take a few days at most to merge the changes to this direction. The conflicts don't seem too bad. The main annoyance is that in some cases (mainly the *.src files) Git fails to notice partial content moves, and generates big conflicts that need to be resolved by applying patches manually to libndarray/src/* instead of numpy/core/src/*.
We probably won't want to do the merge by rebasing like this, though.
The main technical question in the merging seem to be then if it's possible to rewrite the refactoring history to group changes to bigger logical chunks that would be easier to eyeball over and nicer to present to the posterity. Anyway, things are looking good :)
participants (7)
-
Bruce Southey
-
Charles R Harris
-
Dag Sverre Seljebotn
-
David Cournapeau
-
Jason McCampbell
-
Pauli Virtanen
-
Travis Oliphant