[Numpy-discussion] datarray repositories have diverged

Fri Oct 8 23:20:58 EDT 2010

On Fri, Oct 8, 2010 at 6:21 PM, Fernando Perez <fperez.net at gmail.com> wrote:
> Hi Rob, Josh and Lluis,
>
> On Thu, Sep 30, 2010 at 9:41 AM, Rob Speer <rspeer at mit.edu> wrote:
>> The fact that I wasn't around for the sprint probably has a lot to do
>> with how much the code had diverged. But it's not too bad -- I merged
>> Fernando's branch into mine and only had to change a couple of things
>> to make the tests pass.
>>
>> There seem to be two general patterns for decentralized projects on
>> GitHub: either you have one de facto leader who owns what everyone
>> considers the main branch (this is what datarray is doing now, with
>> Fernando as the leader), or you create a GitHub "organization" that
>> owns the main branch and make a bunch of key people members of the
>> organization (which is what numpy is doing).
>>
>> The way you'd usually get something merged in this kind of project is
>> to send a pull request to the leader using the "Pull Request" button.
>> But in this case, I'm basically making my pull request on the mailing
>> list, because it's not straightforward enough for a simple pull
>> request.
>
> sorry it took a bit longer than originally planned to merge this.
> Some of us got together on Wednesday on campus and worked through a
> lot of this code (we closed one pull request by Lluis and replied to
> the other requesting more work, as it broke many tests).  We didn't
> announce an IRC sprint because last time it just proved a bit hard to
> be simultaneously productive on-premises and keep a good flow of
> activity on IRC for remote contributors, sorry.  Perhaps with more
> people available that will be possible later on, but for now it seemed
> wiser for us just to push through with what little work we could do.
>
> **Merged changes**
>
> Here's a summary of the things we did merge in from your pull request:
>
> - merged gitignore, thanks.
>
> - your graft change to manifest.in wasn't needed, instead the bug was
> that our setup file was missing properly listing the testing package,
> and that's the right solution to apply:
>
>  PACKAGES            = ["datarray", "datarray/tests", "datarray/testing"]
>
> Thanks for catching that problem!
>
> - merge readme improvements into readme.txt.  Note that since this is
> a pure python project, markdown-formatted readme files are fairly out
> of place (they don't render correctly on pypi, for one thing, and our
> readme is the same as our long_description field in the setup.py).  So
> the top-level readme must remain a reST file.  But we did incorporate
> your text, thanks!
>
> As a future note when editing text files, please keep text lines to 80
> characters just like code ones.  Diff (and github) are mostly
> line-oriented, so it's best to format text files with hard linebreaks
> (even if many editors can handle soft linewraps correctly, it's just
> not very portable).
>
> - merged your fancy printing support, excellent work!  We did adjust
> the tests a little bit so they would pass without named/attribute
> support, since that will require more discussion (see below).  But the
> main file is in unmodified, and the test changes are tiny.
>
> We made a mini-release v0.0.5 now, with these changes in:
>
> http://fperez.github.com/datarray-doc/0.0.5/index.html
>
> **Workflow**
>
> For future reference, while we were able to manually work through your
> pull request, I'd like to suggest that you adopt a more traditional
> workflow where a single pull request contains only a "conceptually
> atomic" set of changes related to each other. That way it can either
> be all merged or discussed and refined until merge more easily.  Your
> pull request had work from multiple people, making changes of many
> unrelated types (gh-pages, .gitignore, named access, printing,
> etc...).  I was able to cherry-pick one or two commits, but by and
> large I had to resort to manually copying files out of your repo,
> because there were some things that should not be merged, and there
> was no easy way to disentangle it.
>
> Here's for example the currently active pull requests on ipython:
>
> http://github.com/ipython/ipython/pulls
>
> some of them are fairly extensive, but each is conceptually atomic, so
> each can be studied in isolation and either will require refinement or
> will be merged, but nobody is going to have to dissect it into pieces
> to commit some and drop others.
>
> I hope this is clear, let me know if you need any more info on this.
> Ultimately it's a matter of making the process more efficient for all
> involved.
>
> **Changes where further discussion is needed**
>
> There were some things we did *not* merge.  The sphinx extension isn't
> needed (we use a different mechanism for gh-pages that is cleaner), so
> we just ignored it.
>
> But the important point are the changes to named access support.  I
> realize since the conference you've wanted this, but we really would
> like to proceed more carefully and implement first, only .axis.name
> access.  The top-level access requires a changed __getattr__ method
> (which slows down *all* attribute access), and opens the door for name
> collisions.  I think the best approach will be to follow the lead of
> numpy here: structured arrays offer only access by named key ['name'],
> and for plain .name access you need to make them a recarray.  We
> should also offer in our base class only the simpler, safer mechanism,
> and then we can build one that uses the .name attributes as a subclass
> for those who want the convenience and understand the risks and
> tradeoffs.
>
> How does this sound to you?  We should make sure we agree on the api
> before writing too much more code along these lines.  I realize you've
> actually contributed an implementation of this, but I think we need to
> make sure it's the right design before merging it in.  We do need to
> have this design discussion to find a class that will work for as many
> people as possible, so many thanks for starting by offering working
> code!
>
> Regards,
>
> f
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

What mechanism are you using for gh-pages, if I may ask? I would be
interested in this for future projects.

--Josh