datarray repositories have diverged
There's some DataArray code that I've had for a while, but I just finished it up and tested it today. Most notably, it includes a new __str__ for DataArrays that does some nice layout and includes ticks: In [9]: print d_arr country year --------- ------------------------------------------------- 1994 1998 2002 2006 2010 Netherlan -0.505758 0.096597 1.083148 -0.450156 0.172754 Uruguay 1.772182 -0.113394 -0.781307 1.002416 -0.64925 Germany -2.013874 0.283947 1.170848 -0.504823 0.448497 Spain -0.725844 0.909713 -1.191371 -0.465167 -1.518764 (The layout functions in datarray.print_grid actually work with any ndarray, so you can use it as an alternative to the __str__ in NumPy.) However, I notice that all the new development on datarray is happening on Fernando Perez's branch, which mine diverged from long ago. I forked from Lluis (jesusabdullah)'s branch, which was the most active at the time, and I got all but the most recent changes merged back in. But that branch in turn was never merged back into fperez's. The divergence point is even before I added the syntax for named axes and ticks by name (such as arr.named['Spain', 1994:2010] or arr.year.named[2010]). Does it make sense to merge my branch into the main one? http://github.com/rspeer/datarray -- Rob
Rob Speer writes:
However, I notice that all the new development on datarray is happening on Fernando Perez's branch, which mine diverged from long ago. I forked from Lluis (jesusabdullah)'s branch, which was the most active at the time, and I got all but the most recent changes merged back in. But that branch in turn was never merged back into fperez's.
Ups! I thought my master branch was obsolete after the first sprint, so I deleted it and re-branched from fperez's. Thus, I suppose that comparing against my current master won't be useful to you. BTW, my fix branches are incomplete (no tests and doc have been updated), but in the future, how should they be merged (if they should be)? I mean, should datarray fork from the new github numpy into a new repository owned by a "datarray" user? I don't know much about how these kind of things are managed on github, but I remember some comments about that. apa! -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth
The fact that I wasn't around for the sprint probably has a lot to do
with how much the code had diverged. But it's not too bad -- I merged
Fernando's branch into mine and only had to change a couple of things
to make the tests pass.
There seem to be two general patterns for decentralized projects on
GitHub: either you have one de facto leader who owns what everyone
considers the main branch (this is what datarray is doing now, with
Fernando as the leader), or you create a GitHub "organization" that
owns the main branch and make a bunch of key people members of the
organization (which is what numpy is doing).
The way you'd usually get something merged in this kind of project is
to send a pull request to the leader using the "Pull Request" button.
But in this case, I'm basically making my pull request on the mailing
list, because it's not straightforward enough for a simple pull
request.
-- Rob
On Thu, Sep 30, 2010 at 12:22 PM, Lluís
Rob Speer writes:
However, I notice that all the new development on datarray is happening on Fernando Perez's branch, which mine diverged from long ago. I forked from Lluis (jesusabdullah)'s branch, which was the most active at the time, and I got all but the most recent changes merged back in. But that branch in turn was never merged back into fperez's.
Ups! I thought my master branch was obsolete after the first sprint, so I deleted it and re-branched from fperez's. Thus, I suppose that comparing against my current master won't be useful to you.
BTW, my fix branches are incomplete (no tests and doc have been updated), but in the future, how should they be merged (if they should be)? I mean, should datarray fork from the new github numpy into a new repository owned by a "datarray" user? I don't know much about how these kind of things are managed on github, but I remember some comments about that.
apa!
-- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Sep 30, 2010 at 9:41 AM, Rob Speer
The way you'd usually get something merged in this kind of project is to send a pull request to the leader using the "Pull Request" button. But in this case, I'm basically making my pull request on the mailing list, because it's not straightforward enough for a simple pull request.
I just wanted to reply temporarily to say that I'm *not* ignoring this discussion, despite appearances to the contrary :) In the next week we hope to put some time into this at work, and I'll try to catch up with the discussion tomorrow. One thing to note is that the new pull request system on GH is leaps and bounds better than the old. Now they get automatically an issue, a discussion page, a stable url, etc. So if anyone has anything on datarray that they feel is ready to pull, it would be great if you could click again on the pull request button (GH did not auto-migrate old pull requests to the new system, they need to be made again manually). And we'll do our best to hold our end of the bargain of collaborative development over the next few days :) Regards, f
One thing I'd like to throw out there is that I haven't really done
anything with my branch past maybe adding a gh-pages branch, and
probably won't be for a while, if at all. As it turns out, I have a
hard time concentrating on the intricacies of apis. >_<
--Josh (jesusabdullah :E )
On Fri, Oct 1, 2010 at 11:10 AM, Fernando Perez
On Thu, Sep 30, 2010 at 9:41 AM, Rob Speer
wrote: The way you'd usually get something merged in this kind of project is to send a pull request to the leader using the "Pull Request" button. But in this case, I'm basically making my pull request on the mailing list, because it's not straightforward enough for a simple pull request.
I just wanted to reply temporarily to say that I'm *not* ignoring this discussion, despite appearances to the contrary :) In the next week we hope to put some time into this at work, and I'll try to catch up with the discussion tomorrow.
One thing to note is that the new pull request system on GH is leaps and bounds better than the old. Now they get automatically an issue, a discussion page, a stable url, etc. So if anyone has anything on datarray that they feel is ready to pull, it would be great if you could click again on the pull request button (GH did not auto-migrate old pull requests to the new system, they need to be made again manually).
And we'll do our best to hold our end of the bargain of collaborative development over the next few days :)
Regards,
f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Oh, I'm apparently confusing people's github usernames. Sorry about that.
Josh's branch (jesusabdullah/datarray) is indeed the one I branched
from, not Lluis's (xscript/datarray), though I merged in changes from
Lluis at one point.
Does anyone know if it's possible to change the "forked from" location
of my branch to be Fernando's branch?
-- Rob
On Fri, Oct 1, 2010 at 3:22 PM, Joshua Holbrook
One thing I'd like to throw out there is that I haven't really done anything with my branch past maybe adding a gh-pages branch, and probably won't be for a while, if at all. As it turns out, I have a hard time concentrating on the intricacies of apis. >_<
--Josh (jesusabdullah :E )
On Fri, Oct 1, 2010 at 11:10 AM, Fernando Perez
wrote: On Thu, Sep 30, 2010 at 9:41 AM, Rob Speer
wrote: The way you'd usually get something merged in this kind of project is to send a pull request to the leader using the "Pull Request" button. But in this case, I'm basically making my pull request on the mailing list, because it's not straightforward enough for a simple pull request.
I just wanted to reply temporarily to say that I'm *not* ignoring this discussion, despite appearances to the contrary :) In the next week we hope to put some time into this at work, and I'll try to catch up with the discussion tomorrow.
One thing to note is that the new pull request system on GH is leaps and bounds better than the old. Now they get automatically an issue, a discussion page, a stable url, etc. So if anyone has anything on datarray that they feel is ready to pull, it would be great if you could click again on the pull request button (GH did not auto-migrate old pull requests to the new system, they need to be made again manually).
And we'll do our best to hold our end of the bargain of collaborative development over the next few days :)
Regards,
f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Oct 1, 2010 at 2:45 PM, Rob Speer
Does anyone know if it's possible to change the "forked from" location of my branch to be Fernando's branch?
The easiest way (I think) is: - ensure your *local* repo is 100% complete and in good shape. - delete your github repo. - re-fork on github from fperez. - push -f your local repo back to github Cheers, f
Fernando Perez writes:
One thing to note is that the new pull request system on GH is leaps and bounds better than the old. Now they get automatically an issue,
Which has been bugging me, as pull requests where from branches "associated" to already-existing issues... after looking I couldn't find a way in GH to make a pull request associated to an existing issue (creating an automated comment and somehow annotating the issue would be the ideal), instead of creating a new one :) apa! -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth
Hi Rob, Josh and Lluis,
On Thu, Sep 30, 2010 at 9:41 AM, Rob Speer
The fact that I wasn't around for the sprint probably has a lot to do with how much the code had diverged. But it's not too bad -- I merged Fernando's branch into mine and only had to change a couple of things to make the tests pass.
There seem to be two general patterns for decentralized projects on GitHub: either you have one de facto leader who owns what everyone considers the main branch (this is what datarray is doing now, with Fernando as the leader), or you create a GitHub "organization" that owns the main branch and make a bunch of key people members of the organization (which is what numpy is doing).
The way you'd usually get something merged in this kind of project is to send a pull request to the leader using the "Pull Request" button. But in this case, I'm basically making my pull request on the mailing list, because it's not straightforward enough for a simple pull request.
sorry it took a bit longer than originally planned to merge this. Some of us got together on Wednesday on campus and worked through a lot of this code (we closed one pull request by Lluis and replied to the other requesting more work, as it broke many tests). We didn't announce an IRC sprint because last time it just proved a bit hard to be simultaneously productive on-premises and keep a good flow of activity on IRC for remote contributors, sorry. Perhaps with more people available that will be possible later on, but for now it seemed wiser for us just to push through with what little work we could do. **Merged changes** Here's a summary of the things we did merge in from your pull request: - merged gitignore, thanks. - your graft change to manifest.in wasn't needed, instead the bug was that our setup file was missing properly listing the testing package, and that's the right solution to apply: PACKAGES = ["datarray", "datarray/tests", "datarray/testing"] Thanks for catching that problem! - merge readme improvements into readme.txt. Note that since this is a pure python project, markdown-formatted readme files are fairly out of place (they don't render correctly on pypi, for one thing, and our readme is the same as our long_description field in the setup.py). So the top-level readme must remain a reST file. But we did incorporate your text, thanks! As a future note when editing text files, please keep text lines to 80 characters just like code ones. Diff (and github) are mostly line-oriented, so it's best to format text files with hard linebreaks (even if many editors can handle soft linewraps correctly, it's just not very portable). - merged your fancy printing support, excellent work! We did adjust the tests a little bit so they would pass without named/attribute support, since that will require more discussion (see below). But the main file is in unmodified, and the test changes are tiny. We made a mini-release v0.0.5 now, with these changes in: http://fperez.github.com/datarray-doc/0.0.5/index.html **Workflow** For future reference, while we were able to manually work through your pull request, I'd like to suggest that you adopt a more traditional workflow where a single pull request contains only a "conceptually atomic" set of changes related to each other. That way it can either be all merged or discussed and refined until merge more easily. Your pull request had work from multiple people, making changes of many unrelated types (gh-pages, .gitignore, named access, printing, etc...). I was able to cherry-pick one or two commits, but by and large I had to resort to manually copying files out of your repo, because there were some things that should not be merged, and there was no easy way to disentangle it. Here's for example the currently active pull requests on ipython: http://github.com/ipython/ipython/pulls some of them are fairly extensive, but each is conceptually atomic, so each can be studied in isolation and either will require refinement or will be merged, but nobody is going to have to dissect it into pieces to commit some and drop others. I hope this is clear, let me know if you need any more info on this. Ultimately it's a matter of making the process more efficient for all involved. **Changes where further discussion is needed** There were some things we did *not* merge. The sphinx extension isn't needed (we use a different mechanism for gh-pages that is cleaner), so we just ignored it. But the important point are the changes to named access support. I realize since the conference you've wanted this, but we really would like to proceed more carefully and implement first, only .axis.name access. The top-level access requires a changed __getattr__ method (which slows down *all* attribute access), and opens the door for name collisions. I think the best approach will be to follow the lead of numpy here: structured arrays offer only access by named key ['name'], and for plain .name access you need to make them a recarray. We should also offer in our base class only the simpler, safer mechanism, and then we can build one that uses the .name attributes as a subclass for those who want the convenience and understand the risks and tradeoffs. How does this sound to you? We should make sure we agree on the api before writing too much more code along these lines. I realize you've actually contributed an implementation of this, but I think we need to make sure it's the right design before merging it in. We do need to have this design discussion to find a class that will work for as many people as possible, so many thanks for starting by offering working code! Regards, f
On Fri, Oct 8, 2010 at 6:21 PM, Fernando Perez
Hi Rob, Josh and Lluis,
On Thu, Sep 30, 2010 at 9:41 AM, Rob Speer
wrote: The fact that I wasn't around for the sprint probably has a lot to do with how much the code had diverged. But it's not too bad -- I merged Fernando's branch into mine and only had to change a couple of things to make the tests pass.
There seem to be two general patterns for decentralized projects on GitHub: either you have one de facto leader who owns what everyone considers the main branch (this is what datarray is doing now, with Fernando as the leader), or you create a GitHub "organization" that owns the main branch and make a bunch of key people members of the organization (which is what numpy is doing).
The way you'd usually get something merged in this kind of project is to send a pull request to the leader using the "Pull Request" button. But in this case, I'm basically making my pull request on the mailing list, because it's not straightforward enough for a simple pull request.
sorry it took a bit longer than originally planned to merge this. Some of us got together on Wednesday on campus and worked through a lot of this code (we closed one pull request by Lluis and replied to the other requesting more work, as it broke many tests). We didn't announce an IRC sprint because last time it just proved a bit hard to be simultaneously productive on-premises and keep a good flow of activity on IRC for remote contributors, sorry. Perhaps with more people available that will be possible later on, but for now it seemed wiser for us just to push through with what little work we could do.
**Merged changes**
Here's a summary of the things we did merge in from your pull request:
- merged gitignore, thanks.
- your graft change to manifest.in wasn't needed, instead the bug was that our setup file was missing properly listing the testing package, and that's the right solution to apply:
PACKAGES = ["datarray", "datarray/tests", "datarray/testing"]
Thanks for catching that problem!
- merge readme improvements into readme.txt. Note that since this is a pure python project, markdown-formatted readme files are fairly out of place (they don't render correctly on pypi, for one thing, and our readme is the same as our long_description field in the setup.py). So the top-level readme must remain a reST file. But we did incorporate your text, thanks!
As a future note when editing text files, please keep text lines to 80 characters just like code ones. Diff (and github) are mostly line-oriented, so it's best to format text files with hard linebreaks (even if many editors can handle soft linewraps correctly, it's just not very portable).
- merged your fancy printing support, excellent work! We did adjust the tests a little bit so they would pass without named/attribute support, since that will require more discussion (see below). But the main file is in unmodified, and the test changes are tiny.
We made a mini-release v0.0.5 now, with these changes in:
http://fperez.github.com/datarray-doc/0.0.5/index.html
**Workflow**
For future reference, while we were able to manually work through your pull request, I'd like to suggest that you adopt a more traditional workflow where a single pull request contains only a "conceptually atomic" set of changes related to each other. That way it can either be all merged or discussed and refined until merge more easily. Your pull request had work from multiple people, making changes of many unrelated types (gh-pages, .gitignore, named access, printing, etc...). I was able to cherry-pick one or two commits, but by and large I had to resort to manually copying files out of your repo, because there were some things that should not be merged, and there was no easy way to disentangle it.
Here's for example the currently active pull requests on ipython:
http://github.com/ipython/ipython/pulls
some of them are fairly extensive, but each is conceptually atomic, so each can be studied in isolation and either will require refinement or will be merged, but nobody is going to have to dissect it into pieces to commit some and drop others.
I hope this is clear, let me know if you need any more info on this. Ultimately it's a matter of making the process more efficient for all involved.
**Changes where further discussion is needed**
There were some things we did *not* merge. The sphinx extension isn't needed (we use a different mechanism for gh-pages that is cleaner), so we just ignored it.
But the important point are the changes to named access support. I realize since the conference you've wanted this, but we really would like to proceed more carefully and implement first, only .axis.name access. The top-level access requires a changed __getattr__ method (which slows down *all* attribute access), and opens the door for name collisions. I think the best approach will be to follow the lead of numpy here: structured arrays offer only access by named key ['name'], and for plain .name access you need to make them a recarray. We should also offer in our base class only the simpler, safer mechanism, and then we can build one that uses the .name attributes as a subclass for those who want the convenience and understand the risks and tradeoffs.
How does this sound to you? We should make sure we agree on the api before writing too much more code along these lines. I realize you've actually contributed an implementation of this, but I think we need to make sure it's the right design before merging it in. We do need to have this design discussion to find a class that will work for as many people as possible, so many thanks for starting by offering working code!
Regards,
f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
What mechanism are you using for gh-pages, if I may ask? I would be interested in this for future projects. --Josh
Hi,
On Fri, Oct 8, 2010 at 8:20 PM, Joshua Holbrook
What mechanism are you using for gh-pages, if I may ask? I would be interested in this for future projects.
the default github implementation relies on a 'hidden' branch called gh-pages that lives in the main repo. I think this is a *horrible* idea because it requires polluting the real repo with builds of the documentation that are auto-generated. Furthermore, I like keeping live versions of the docs of a project for each release, and the gh-pages branch is thus likely to get rather large. Instead, I made a *separate* repo to be used only for gh-pages: http://github.com/fperez/datarray-doc The only purpose of this repo is to provide the docs for datarray. The datarray doc/Makefile has a gh-pages target that updates the html build and then runs the gh-pages script. That's a simple code I wrote to populate the -doc repo with a ready-to-push build of the docs. On each release, we simply do make gh-pages and follow the instructions it prints, which amount to: 1. cd ../datarray-doc 2. check the docs look good 3. git push That pushes the doc build. This workflow is extremely simple, gives us builds of the documentation for all releases nicely indexed with a stable url: http://fperez.github.com/datarray-doc/ and it produces zero pollution of the main repo with gh-pages junk. Cheers, f
On Fri, Oct 8, 2010 at 9:05 PM, Fernando Perez
Hi,
On Fri, Oct 8, 2010 at 8:20 PM, Joshua Holbrook
wrote: What mechanism are you using for gh-pages, if I may ask? I would be interested in this for future projects.
the default github implementation relies on a 'hidden' branch called gh-pages that lives in the main repo. I think this is a *horrible* idea because it requires polluting the real repo with builds of the documentation that are auto-generated. Furthermore, I like keeping live versions of the docs of a project for each release, and the gh-pages branch is thus likely to get rather large.
Instead, I made a *separate* repo to be used only for gh-pages:
http://github.com/fperez/datarray-doc
The only purpose of this repo is to provide the docs for datarray. The datarray doc/Makefile has a gh-pages target that updates the html build and then runs the gh-pages script. That's a simple code I wrote to populate the -doc repo with a ready-to-push build of the docs.
On each release, we simply do
make gh-pages
and follow the instructions it prints, which amount to:
1. cd ../datarray-doc 2. check the docs look good 3. git push
That pushes the doc build.
This workflow is extremely simple, gives us builds of the documentation for all releases nicely indexed with a stable url:
http://fperez.github.com/datarray-doc/
and it produces zero pollution of the main repo with gh-pages junk.
Cheers,
f _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I have to admit to not being a big fan of the oddball 'gh-pages' branch either. I wish there was a better way of doing gh-pages documentation than having to put it in a separate repo, but it's better than two branches with *completely different* files fighting polluting each other (Or worse, dumping build files into /tmp, checking out gh-pages, copying from /tmp back in, etc., etc., which is what I found myself doing). Plus, you could always make the docs a submodule, if that's how you roll (might be a good idea, actually). Now, if I could just find a documentation tool that I like. I'm not huge on Sphinx, tbh, and RST doesn't really light my fire. Anyways--I feel enlightened! Thanks! --Josh
participants (4)
-
Fernando Perez
-
Joshua Holbrook
-
Rob Speer
-
xscript@gmx.net