Re: [Yt-dev] Parallelism, or, how I learned to stop worrying and love open source development
Hi all,
I'm going to top post, which I guess I do more than I ought to anyway,
because I'm going to try to address a number of issues that have been
brought up. I've spent some of the day thinking about this issue, and
what it says about yt as a community and about my level of involvement
in various areas.
So, I'll touch on those at the end, but first I'll hit back on the
issue of parallelism and how to address it.
= Parallelism =
I think what is becoming clear is that the step from serial to
parallel, in terms of user experience, should be more well-handled
than it currently is. As it stands, the section in the manual that
covers parallelism basically says, "These things work, go ahead and
give it a go!" This is my fault, and it's not really sufficient.
More detail has to be given, and rather than a whitelist of actions
that are parallel safe we need to also include a *blacklist*.
The second step we need to take is provide examples of how to submit a
parallel job -- how much it requires in terms of resources and so on.
Unfortunately, it's not entirely clear to me the best way to organize
the documentation, and I don't even really know where this would go.
Stephen did a really rad job of doing this in the halo finding paper,
and he's done an excellent job with his work on the halo finder as a
whole. (It's just that last 5% toward the user experience, I think.
:) My own work on the parallel projections should be better
documented and the UX there should be improved as well.
The third is to keep an eye on memory usage. Memory profiling is
difficult, but it's something we have tried before and that I believe
needs to be re-examined. Specifically, it seems that both projections
and the parallel halo finder suffer from this problem. As a note,
next week I will be spending some time swapping out the old projection
method for the new quad-tree method. This should improve both speed
and memory usage.
Okay, on to the larger problems that I think this relates to.
= Bugs =
First off, we need a mechanism for handling and bugs. I don't want to
use the word "triage" here, but it is becoming clear that we need a
mechanism. Currently, we have a Trac site that really doesn't get
used at all. I've explored a couple mechanisms for encouraging bug
reports.
* I can enable OpenID login -- this means using something like your
GoogleName to log in and report a bug.
* I've already replicated the .htpasswd between mercurial and the
Trac site, so anyone who has a report there can log in to the Trac
site.
* yt could register a default excepthook that encourages the user to
report a bug. I'm leery of this because I'm not sure I want to muck
about with Python internals that much, but it could be done nicely, I
think.
Overall, though, what really needs to happen is some kind of *buy-in*
on the part of the user -- which in this case is anyone who has had
trouble with yt. I have pulled back from yt-users, and I'm really
happy that everyone else has stepped up. But I'm worried that as time
goes on, people will pick up knowledge in ways that aren't indexable
by search engines and then this knowledge keeps getting re-learned.
Public reporting of bugs, particularly as it could relate to
improvements in documentation, is essential. But this can't happen if
it's just driven by one or two people. And if no one else is
motivated to encourage this, then perhaps that's just where we'll
stay. I can't force buy-in, I can only encourage people to see the
benefits to reporting bugs, sharing experiences, and all of that. We
need to have people to read and handle bugs, and then people to whom
they apply. I really would like for this not always to be me.
Anyway, if you have an hg account, you can login:
http://yt.enzotools.org/login/
and then report a bug:
http://yt.enzotools.org/newticket
It helps if you paste the traceback with --paste on the command line
of your script.
= Fixing Bugs =
We've had great success with people taking ownership of different bugs
on the mailing list and fixing them. This is a huge success story,
and I thank all the developers that have made this happen. But I
think it's important that we continue to develop this sense of
ownership through the Trac site.
= Major Enhancements =
Adding on major enhancements is unfortunately an open problem. For
instance, I would really like to see the parallelism framework
essentially rewritten to be more modular and to take advantage of
nested MPI communicators. I have a sketch of how this would go, and
I've even written some code. But, I'm not employed to work on yt. I
mostly develop it either as it suits my research interests (and I am
operating under the working assumption that this is true for everyone
else) or as I find it something fun to do in the evenings. I want it
to be used, and to be useful, and I believe that my stewardship of the
project up to this point supports this conclusion.
I *truly* do believe in cross-code simulation analysis, sharing
facilities with other users, and reproducible research. But I am
reaching the limits of what I, alone, can do. So far we've had some
pretty major contributions from a number of developers, but I think
it's important that we communicate to the community that this is still
a volunteer project.
We don't have a team of dedicated software developers, we have a
handful of scientists who are working to both further their own
research interests while providing the best user experience possible
for an advanced analysis code. And, to be perfectly frank, I think
we're doing a pretty darn good job on both of those fronts. Many
people now use yt on a daily basis to analyze simulation outputs from
several different codes. We've got advanced analysis and viz
functionality, thanks to *you* developers, that has been published a
dozen papers, been shown at the Adler Planetarium, taken home the
third place at the SciDAC visualization "Oscars," and even (ever so
briefly) been on the Discovery channel.
But, still, we have to keep our eye on the prize. And if the prize
the other developers have *their* eye on isn't the prize *you* have
your eye on, unfortunately some responsibility will fall back on to
your shoulders. I honestly wish I could spend more time helping
others use yt, developing yt, and building it to be the tool I really
wish it would be. Don't think that I don't see all the warts and
problems that you all see -- I do. In the docs, the source code, the
functionality, the user experience ... I see the warts too.
But even though developing yt is fun, I'm still developing it because
I'm a scientist who wants to ask questions of his data.
= Building Community =
We've done a good job of this, but it's becoming clear that there's a disjoint:
* We're doing a mediocre job of shepherding users into being
contributing developers. I'd like to help fix this by writing up more
suggestions on how to develop and share your changes. yt will
stagnate if we don't continue to churn the developer list.
* We need to articulate the vision for yt, and I'm not sure my vision
is the one anyone else has.
I'd love to hear suggestions about this aspect.
= Documentation =
Any help anyone can give with documentation would be great.
Organization, notes, suggestions, anything. Report it as a bug.
Commit changes. Email the list.
==
Anyway, that's basically what I've been thinking about, and what I
wanted to say. I think we have an opportunity with yt to build a real
community of collaboration and sharing of resources. And we've done a
great job with that so far. But it still has to be something of a
jumpstart approach -- jumpstarting development and then encouraging
others to pick up the torch and run with it. Grass roots,
science-driven development is kind of the name of the game here.
And when there *are* problems, I'm sure that lots of people are eager
to jump at helping you fix them. But we have to hear about 'em before
we can. :)
Thanks,
Matt
On Thu, Aug 19, 2010 at 1:12 PM, Stephen Skory
Hi Brian & Eric,
As you know (since we discussed it off-list), I'm the reason for this being mentioned to you. I had some pretty horrible problems with the various incarnations of HOP in yt being excruciatingly slow and consuming huge amounts of memory for a 1024^3 unigrid dataset, to the point where my grad student and I
ended up just using P-GroupFinder, the standalone halo finder that comes with week-of-code enzo. Note that when I say "excruciatingly slow" and "consuming huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2 cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out
of memory, or, alternately, didn't finish in 24 hours.
A few notes in response:
- Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours which averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8 the size and should have run, and I don't know why it didn't.
- If I wasn't trying to graduate I would have had more time to assist when your student (Brian) asked me for help. I'm sorry so much of your time was wasted.
- My tool as a public tool is not any good unless other people can use it too. Clearly I need to do some work on that.
- It *does* use much more memory than it needs to, you are right. I know where the problems are, and whoo-boy they are there, but they are not easy to fix.
- Speed could be better, but some of this has to do with how HOP itself works. For example, it needs to run the kD tree twice, unlike FOF which needs to only once. The final group building step is a "global" operation, so that's slow as well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF about 25. The C HOP and FOF in yt both use the same kD tree, same data I/O methods, so that's a fair ratio of the increased workload.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi everyone,
I would like to chime in on some of the issues Matt has raised. These are
very important things to think about, which is why I stayed up all night to
read the whole email.
I will mostly stay out of the parallelism issue, but I'll only add that I
have been doing projections of 1024^3 unigrid data on kraken with 64 cores.
They have gone fine for me, taking roughly 10 seconds or so each. I also
think an explicit list of actions that do not run in parallel is a really
good idea.
On the bugs issue, it is not clear to me how a new user can tell the
difference between a bug and simply doing something wrong. Either way, I
think that the users list is succeeding in getting people's issues solved,
provided that the issues make it there. I think we really need to encourage
all users to be taking all potential issues through the list first, even if
the resolution eventually takes place off-list. Even if the feature author
is doing all of the talking, this makes any new knowledge public, and allows
other people to help out if they can. Most people already do this, but I
would suggest again that we ask that any requests for help we receive
directly be resent to the user list.
On the community, there clearly needs to be a balance between what users
expect to get from the code and what the developers are obligated to
provide. With exceptions, a vast majority of those contributing code are
not doing so in their spare time. Much of this code is related to their own
work, but a nonzero amount is stuff that simply needs to get done. Users of
the code need to recognize this. However, at the same time, we as
developers need to hold ourselves to some standards, namely, if we say the
code does something, it better do it. Clearly, there are situations where
we can not deliver on this, at least not right away. In general though, we
need to be clear about what the code will and will not do, and see that our
statements are and remain true.
I think we should consider setting up a wish list, where users can submit
ideas for features that they would like see added to the code. This should
be viewable by everyone. I think this might help people stay conscious of
the fact that if they want something, that means someone has to physically
go and do it for them. Maybe this will even give people the notion that
they can do it on their own.
Britton
On Thu, Aug 19, 2010 at 11:54 PM, Matthew Turk
Hi all,
I'm going to top post, which I guess I do more than I ought to anyway, because I'm going to try to address a number of issues that have been brought up. I've spent some of the day thinking about this issue, and what it says about yt as a community and about my level of involvement in various areas.
So, I'll touch on those at the end, but first I'll hit back on the issue of parallelism and how to address it.
= Parallelism =
I think what is becoming clear is that the step from serial to parallel, in terms of user experience, should be more well-handled than it currently is. As it stands, the section in the manual that covers parallelism basically says, "These things work, go ahead and give it a go!" This is my fault, and it's not really sufficient. More detail has to be given, and rather than a whitelist of actions that are parallel safe we need to also include a *blacklist*.
The second step we need to take is provide examples of how to submit a parallel job -- how much it requires in terms of resources and so on. Unfortunately, it's not entirely clear to me the best way to organize the documentation, and I don't even really know where this would go. Stephen did a really rad job of doing this in the halo finding paper, and he's done an excellent job with his work on the halo finder as a whole. (It's just that last 5% toward the user experience, I think. :) My own work on the parallel projections should be better documented and the UX there should be improved as well.
The third is to keep an eye on memory usage. Memory profiling is difficult, but it's something we have tried before and that I believe needs to be re-examined. Specifically, it seems that both projections and the parallel halo finder suffer from this problem. As a note, next week I will be spending some time swapping out the old projection method for the new quad-tree method. This should improve both speed and memory usage.
Okay, on to the larger problems that I think this relates to.
= Bugs =
First off, we need a mechanism for handling and bugs. I don't want to use the word "triage" here, but it is becoming clear that we need a mechanism. Currently, we have a Trac site that really doesn't get used at all. I've explored a couple mechanisms for encouraging bug reports.
* I can enable OpenID login -- this means using something like your GoogleName to log in and report a bug. * I've already replicated the .htpasswd between mercurial and the Trac site, so anyone who has a report there can log in to the Trac site. * yt could register a default excepthook that encourages the user to report a bug. I'm leery of this because I'm not sure I want to muck about with Python internals that much, but it could be done nicely, I think.
Overall, though, what really needs to happen is some kind of *buy-in* on the part of the user -- which in this case is anyone who has had trouble with yt. I have pulled back from yt-users, and I'm really happy that everyone else has stepped up. But I'm worried that as time goes on, people will pick up knowledge in ways that aren't indexable by search engines and then this knowledge keeps getting re-learned.
Public reporting of bugs, particularly as it could relate to improvements in documentation, is essential. But this can't happen if it's just driven by one or two people. And if no one else is motivated to encourage this, then perhaps that's just where we'll stay. I can't force buy-in, I can only encourage people to see the benefits to reporting bugs, sharing experiences, and all of that. We need to have people to read and handle bugs, and then people to whom they apply. I really would like for this not always to be me.
Anyway, if you have an hg account, you can login:
http://yt.enzotools.org/login/
and then report a bug:
http://yt.enzotools.org/newticket
It helps if you paste the traceback with --paste on the command line of your script.
= Fixing Bugs =
We've had great success with people taking ownership of different bugs on the mailing list and fixing them. This is a huge success story, and I thank all the developers that have made this happen. But I think it's important that we continue to develop this sense of ownership through the Trac site.
= Major Enhancements =
Adding on major enhancements is unfortunately an open problem. For instance, I would really like to see the parallelism framework essentially rewritten to be more modular and to take advantage of nested MPI communicators. I have a sketch of how this would go, and I've even written some code. But, I'm not employed to work on yt. I mostly develop it either as it suits my research interests (and I am operating under the working assumption that this is true for everyone else) or as I find it something fun to do in the evenings. I want it to be used, and to be useful, and I believe that my stewardship of the project up to this point supports this conclusion.
I *truly* do believe in cross-code simulation analysis, sharing facilities with other users, and reproducible research. But I am reaching the limits of what I, alone, can do. So far we've had some pretty major contributions from a number of developers, but I think it's important that we communicate to the community that this is still a volunteer project.
We don't have a team of dedicated software developers, we have a handful of scientists who are working to both further their own research interests while providing the best user experience possible for an advanced analysis code. And, to be perfectly frank, I think we're doing a pretty darn good job on both of those fronts. Many people now use yt on a daily basis to analyze simulation outputs from several different codes. We've got advanced analysis and viz functionality, thanks to *you* developers, that has been published a dozen papers, been shown at the Adler Planetarium, taken home the third place at the SciDAC visualization "Oscars," and even (ever so briefly) been on the Discovery channel.
But, still, we have to keep our eye on the prize. And if the prize the other developers have *their* eye on isn't the prize *you* have your eye on, unfortunately some responsibility will fall back on to your shoulders. I honestly wish I could spend more time helping others use yt, developing yt, and building it to be the tool I really wish it would be. Don't think that I don't see all the warts and problems that you all see -- I do. In the docs, the source code, the functionality, the user experience ... I see the warts too.
But even though developing yt is fun, I'm still developing it because I'm a scientist who wants to ask questions of his data.
= Building Community =
We've done a good job of this, but it's becoming clear that there's a disjoint:
* We're doing a mediocre job of shepherding users into being contributing developers. I'd like to help fix this by writing up more suggestions on how to develop and share your changes. yt will stagnate if we don't continue to churn the developer list. * We need to articulate the vision for yt, and I'm not sure my vision is the one anyone else has.
I'd love to hear suggestions about this aspect.
= Documentation =
Any help anyone can give with documentation would be great. Organization, notes, suggestions, anything. Report it as a bug. Commit changes. Email the list.
==
Anyway, that's basically what I've been thinking about, and what I wanted to say. I think we have an opportunity with yt to build a real community of collaboration and sharing of resources. And we've done a great job with that so far. But it still has to be something of a jumpstart approach -- jumpstarting development and then encouraging others to pick up the torch and run with it. Grass roots, science-driven development is kind of the name of the game here.
And when there *are* problems, I'm sure that lots of people are eager to jump at helping you fix them. But we have to hear about 'em before we can. :)
Thanks,
Matt
Hi Brian & Eric,
As you know (since we discussed it off-list), I'm the reason for this being mentioned to you. I had some pretty horrible problems with the various incarnations of HOP in yt being excruciatingly slow and consuming huge amounts of memory for a 1024^3 unigrid dataset, to the point where my grad student and I
ended up just using P-GroupFinder, the standalone halo finder that comes with week-of-code enzo. Note that when I say "excruciatingly slow" and "consuming huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2 cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out
of memory, or, alternately, didn't finish in 24 hours.
A few notes in response:
- Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours which averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8
On Thu, Aug 19, 2010 at 1:12 PM, Stephen Skory
wrote: the size and should have run, and I don't know why it didn't.
- If I wasn't trying to graduate I would have had more time to assist when your student (Brian) asked me for help. I'm sorry so much of your time was wasted.
- My tool as a public tool is not any good unless other people can use it too. Clearly I need to do some work on that.
- It *does* use much more memory than it needs to, you are right. I know where the problems are, and whoo-boy they are there, but they are not easy to fix.
- Speed could be better, but some of this has to do with how HOP itself works. For example, it needs to run the kD tree twice, unlike FOF which needs to only once. The final group building step is a "global" operation, so that's slow as well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF about 25. The C HOP and FOF in yt both use the same kD tree, same data I/O methods, so that's a fair ratio of the increased workload.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ http://physics.ucsd.edu/%7Esskory/_.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi all,
Captain Paininthebutt chiming in here. I wasn't up all night reading Matt's
email, but the owner of the local Starbucks franchise got a pretty
substantial addition to their kid's college fund, if you get my meaning.
I agree strongly with Matt's point that the level of parallelism of various
functions would be useful for the end user (case in point: we were startled
to learn that we're probably wasting our time doing slices in parallel). A
small table of function vs. parallelism would be great: "scales
embarrassingly well", "scales somewhat - use carefully", "should be used in
serial", and "must be used in serial" would be useful. This is probably an
oversimplification, but some footnotes would help: "If you want to do a
projection through a very large simulation, use fixed resolution buffers."
A similar estimate of memory usage for problems would be very handy, at
least for the largest calculations. Furthermore, a couple of example batch
scripts would go a long way - a Kraken batch script for Parallel HOP made
its way to Brian Crosby and I, and we found that very informative.
I also think that Matt and Britton have made some very important points
regarding bugs and the yt-users list. My student and I have not really
helped this - I tend to go right to the person who I suspect can help us
most effectively, rather than through yt-users. There are various reasons
for that, but it's not what is best for the community, and we'll go through
yt-users as much as possible in the future, even if discussions get taken
off-list. If this is done, making a practice of posting a recap to the list
after a bug is solved would be useful, since Matt assures me that the lists
are archived.
Regarding expectations vs. obligations, I think that it is appropriate for
developers to fix bugs in features they create, and to give users some idea
of the resources that those features require. On the other side of the
coin, non-developer users (like myself) are obligated to give feedback on
features: what's useful? what's not? what appears to be broken? My
observation has been that a few of the yt features could become vastly more
useful to people other than their developer if a few widgets were added.
This is generally not trivial, but is far easier for the original developer
than for somebody who is new to yt and python in general, and a given yt
feature in particular. Documentation and examples are also key - I
personally have found the cookbook to be invaluable!
I very much like the idea of a "wish list", particularly if other
users/developers can comment on what wished-for features would be useful to
them, and the level of difficulty in implementing the feature. This would
also facilitate coordination between yt community members - if I need to
design a kitchen-sink-cleaning module for yt, and so does Britton, it's
probably better for the two of us to work together than to separately
implement our tools. Also, it's much more likely that the resulting product
will be generally usable.
Matt and Britton both mentioned something that I've been thinking about for
the last couple of days - nobody gets paid to develop yt. I really do think
this could change, if we wanted it to - the NSF does occasionally give out
money for code development, often as part of a larger science grant, and I'd
be happy to start writing this into my proposals. Of course, this would
require somebody with deep familiarity with yt to commit to working on
specific parts of yt for an extended period of time, but depending on the
circumstances might be good for all parties concerned, including the larger
yt community.
Anyway, that's my $0.02 of coffee-fueled ramblings.
--Brian
On Fri, Aug 20, 2010 at 3:04 PM, Britton Smith
Hi everyone,
I would like to chime in on some of the issues Matt has raised. These are very important things to think about, which is why I stayed up all night to read the whole email.
I will mostly stay out of the parallelism issue, but I'll only add that I have been doing projections of 1024^3 unigrid data on kraken with 64 cores. They have gone fine for me, taking roughly 10 seconds or so each. I also think an explicit list of actions that do not run in parallel is a really good idea.
On the bugs issue, it is not clear to me how a new user can tell the difference between a bug and simply doing something wrong. Either way, I think that the users list is succeeding in getting people's issues solved, provided that the issues make it there. I think we really need to encourage all users to be taking all potential issues through the list first, even if the resolution eventually takes place off-list. Even if the feature author is doing all of the talking, this makes any new knowledge public, and allows other people to help out if they can. Most people already do this, but I would suggest again that we ask that any requests for help we receive directly be resent to the user list.
On the community, there clearly needs to be a balance between what users expect to get from the code and what the developers are obligated to provide. With exceptions, a vast majority of those contributing code are not doing so in their spare time. Much of this code is related to their own work, but a nonzero amount is stuff that simply needs to get done. Users of the code need to recognize this. However, at the same time, we as developers need to hold ourselves to some standards, namely, if we say the code does something, it better do it. Clearly, there are situations where we can not deliver on this, at least not right away. In general though, we need to be clear about what the code will and will not do, and see that our statements are and remain true.
I think we should consider setting up a wish list, where users can submit ideas for features that they would like see added to the code. This should be viewable by everyone. I think this might help people stay conscious of the fact that if they want something, that means someone has to physically go and do it for them. Maybe this will even give people the notion that they can do it on their own.
Britton
On Thu, Aug 19, 2010 at 11:54 PM, Matthew Turk
wrote: Hi all,
I'm going to top post, which I guess I do more than I ought to anyway, because I'm going to try to address a number of issues that have been brought up. I've spent some of the day thinking about this issue, and what it says about yt as a community and about my level of involvement in various areas.
So, I'll touch on those at the end, but first I'll hit back on the issue of parallelism and how to address it.
= Parallelism =
I think what is becoming clear is that the step from serial to parallel, in terms of user experience, should be more well-handled than it currently is. As it stands, the section in the manual that covers parallelism basically says, "These things work, go ahead and give it a go!" This is my fault, and it's not really sufficient. More detail has to be given, and rather than a whitelist of actions that are parallel safe we need to also include a *blacklist*.
The second step we need to take is provide examples of how to submit a parallel job -- how much it requires in terms of resources and so on. Unfortunately, it's not entirely clear to me the best way to organize the documentation, and I don't even really know where this would go. Stephen did a really rad job of doing this in the halo finding paper, and he's done an excellent job with his work on the halo finder as a whole. (It's just that last 5% toward the user experience, I think. :) My own work on the parallel projections should be better documented and the UX there should be improved as well.
The third is to keep an eye on memory usage. Memory profiling is difficult, but it's something we have tried before and that I believe needs to be re-examined. Specifically, it seems that both projections and the parallel halo finder suffer from this problem. As a note, next week I will be spending some time swapping out the old projection method for the new quad-tree method. This should improve both speed and memory usage.
Okay, on to the larger problems that I think this relates to.
= Bugs =
First off, we need a mechanism for handling and bugs. I don't want to use the word "triage" here, but it is becoming clear that we need a mechanism. Currently, we have a Trac site that really doesn't get used at all. I've explored a couple mechanisms for encouraging bug reports.
* I can enable OpenID login -- this means using something like your GoogleName to log in and report a bug. * I've already replicated the .htpasswd between mercurial and the Trac site, so anyone who has a report there can log in to the Trac site. * yt could register a default excepthook that encourages the user to report a bug. I'm leery of this because I'm not sure I want to muck about with Python internals that much, but it could be done nicely, I think.
Overall, though, what really needs to happen is some kind of *buy-in* on the part of the user -- which in this case is anyone who has had trouble with yt. I have pulled back from yt-users, and I'm really happy that everyone else has stepped up. But I'm worried that as time goes on, people will pick up knowledge in ways that aren't indexable by search engines and then this knowledge keeps getting re-learned.
Public reporting of bugs, particularly as it could relate to improvements in documentation, is essential. But this can't happen if it's just driven by one or two people. And if no one else is motivated to encourage this, then perhaps that's just where we'll stay. I can't force buy-in, I can only encourage people to see the benefits to reporting bugs, sharing experiences, and all of that. We need to have people to read and handle bugs, and then people to whom they apply. I really would like for this not always to be me.
Anyway, if you have an hg account, you can login:
http://yt.enzotools.org/login/
and then report a bug:
http://yt.enzotools.org/newticket
It helps if you paste the traceback with --paste on the command line of your script.
= Fixing Bugs =
We've had great success with people taking ownership of different bugs on the mailing list and fixing them. This is a huge success story, and I thank all the developers that have made this happen. But I think it's important that we continue to develop this sense of ownership through the Trac site.
= Major Enhancements =
Adding on major enhancements is unfortunately an open problem. For instance, I would really like to see the parallelism framework essentially rewritten to be more modular and to take advantage of nested MPI communicators. I have a sketch of how this would go, and I've even written some code. But, I'm not employed to work on yt. I mostly develop it either as it suits my research interests (and I am operating under the working assumption that this is true for everyone else) or as I find it something fun to do in the evenings. I want it to be used, and to be useful, and I believe that my stewardship of the project up to this point supports this conclusion.
I *truly* do believe in cross-code simulation analysis, sharing facilities with other users, and reproducible research. But I am reaching the limits of what I, alone, can do. So far we've had some pretty major contributions from a number of developers, but I think it's important that we communicate to the community that this is still a volunteer project.
We don't have a team of dedicated software developers, we have a handful of scientists who are working to both further their own research interests while providing the best user experience possible for an advanced analysis code. And, to be perfectly frank, I think we're doing a pretty darn good job on both of those fronts. Many people now use yt on a daily basis to analyze simulation outputs from several different codes. We've got advanced analysis and viz functionality, thanks to *you* developers, that has been published a dozen papers, been shown at the Adler Planetarium, taken home the third place at the SciDAC visualization "Oscars," and even (ever so briefly) been on the Discovery channel.
But, still, we have to keep our eye on the prize. And if the prize the other developers have *their* eye on isn't the prize *you* have your eye on, unfortunately some responsibility will fall back on to your shoulders. I honestly wish I could spend more time helping others use yt, developing yt, and building it to be the tool I really wish it would be. Don't think that I don't see all the warts and problems that you all see -- I do. In the docs, the source code, the functionality, the user experience ... I see the warts too.
But even though developing yt is fun, I'm still developing it because I'm a scientist who wants to ask questions of his data.
= Building Community =
We've done a good job of this, but it's becoming clear that there's a disjoint:
* We're doing a mediocre job of shepherding users into being contributing developers. I'd like to help fix this by writing up more suggestions on how to develop and share your changes. yt will stagnate if we don't continue to churn the developer list. * We need to articulate the vision for yt, and I'm not sure my vision is the one anyone else has.
I'd love to hear suggestions about this aspect.
= Documentation =
Any help anyone can give with documentation would be great. Organization, notes, suggestions, anything. Report it as a bug. Commit changes. Email the list.
==
Anyway, that's basically what I've been thinking about, and what I wanted to say. I think we have an opportunity with yt to build a real community of collaboration and sharing of resources. And we've done a great job with that so far. But it still has to be something of a jumpstart approach -- jumpstarting development and then encouraging others to pick up the torch and run with it. Grass roots, science-driven development is kind of the name of the game here.
And when there *are* problems, I'm sure that lots of people are eager to jump at helping you fix them. But we have to hear about 'em before we can. :)
Thanks,
Matt
Hi Brian & Eric,
As you know (since we discussed it off-list), I'm the reason for this being mentioned to you. I had some pretty horrible problems with the various incarnations of HOP in yt being excruciatingly slow and consuming huge amounts of memory for a 1024^3 unigrid dataset, to the point where my grad student and I
ended up just using P-GroupFinder, the standalone halo finder that comes with week-of-code enzo. Note that when I say "excruciatingly slow" and "consuming huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2 cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out
of memory, or, alternately, didn't finish in 24 hours.
A few notes in response:
- Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours which averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8
On Thu, Aug 19, 2010 at 1:12 PM, Stephen Skory
wrote: the size and should have run, and I don't know why it didn't.
- If I wasn't trying to graduate I would have had more time to assist when your student (Brian) asked me for help. I'm sorry so much of your time was wasted.
- My tool as a public tool is not any good unless other people can use it too. Clearly I need to do some work on that.
- It *does* use much more memory than it needs to, you are right. I know where the problems are, and whoo-boy they are there, but they are not easy to fix.
- Speed could be better, but some of this has to do with how HOP itself works. For example, it needs to run the kD tree twice, unlike FOF which needs to only once. The final group building step is a "global" operation, so that's slow as well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF about 25. The C HOP and FOF in yt both use the same kD tree, same data I/O methods, so that's a fair ratio of the increased workload.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ http://physics.ucsd.edu/%7Esskory/_.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Brian, I'm sure everyone's sick of my replies, but I owe you one on this.
I agree strongly with Matt's point that the level of parallelism of various functions would be useful for the end user (case in point: we were startled to learn that we're probably wasting our time doing slices in parallel).
I've added a ticket for adding a list of effective parallel tasks. (As a quick note to hammer this home, I would never, ever encourage someone to slice in parallel -- yt is exclusively load on demand. Slicing the 512^3 L7 is feasible in serial on a laptop. Slices in parallel are useful for when the data is already distributed, i.e., inline.)
A small table of function vs. parallelism would be great: "scales embarrassingly well", "scales somewhat - use carefully", "should be used in serial", and "must be used in serial" would be useful. This is probably an oversimplification, but some footnotes would help: "If you want to do a projection through a very large simulation, use fixed resolution buffers." A similar estimate of memory usage for problems would be very handy, at least for the largest calculations. Furthermore, a couple of example batch scripts would go a long way - a Kraken batch script for Parallel HOP made its way to Brian Crosby and I, and we found that very informative.
We should provide more examples of usages on TeraGrid resources. I have been stewing on the idea of a 'yt kit.' I'll see if I can put that more clearly into words/code sometime soon.
If this is done, making a practice of posting a recap to the list after a bug is solved would be useful, since Matt assures me that the lists are archived.
They are! :)
Regarding expectations vs. obligations, I think that it is appropriate for developers to fix bugs in features they create, and to give users some idea of the resources that those features require. On the other side of the coin, non-developer users (like myself) are obligated to give feedback on features: what's useful? what's not? what appears to be broken? My observation has been that a few of the yt features could become vastly more useful to people other than their developer if a few widgets were added. This is generally not trivial, but is far easier for the original developer than for somebody who is new to yt and python in general, and a given yt feature in particular. Documentation and examples are also key - I personally have found the cookbook to be invaluable!
I'm glad the cookbook has helped; I'm hopeful that we can improve it a bit, as well. Oliver and I have chatted about analysis modules, which ties into the 'yt kit' and which I am going to explore a bit more later. As for problems with things, I'm now encouraging that these get recorded in bug reports. I sympathize with what you say about things that are *almost* useful. :) Thanks very much for bring this all out into the light, and I hope that we can take this as a starting point for improving the code, the docs, and our community process. Best, Matt
Anyway, that's my $0.02 of coffee-fueled ramblings.
--Brian
On Fri, Aug 20, 2010 at 3:04 PM, Britton Smith
wrote: Hi everyone,
I would like to chime in on some of the issues Matt has raised. These are very important things to think about, which is why I stayed up all night to read the whole email.
I will mostly stay out of the parallelism issue, but I'll only add that I have been doing projections of 1024^3 unigrid data on kraken with 64 cores. They have gone fine for me, taking roughly 10 seconds or so each. I also think an explicit list of actions that do not run in parallel is a really good idea.
On the bugs issue, it is not clear to me how a new user can tell the difference between a bug and simply doing something wrong. Either way, I think that the users list is succeeding in getting people's issues solved, provided that the issues make it there. I think we really need to encourage all users to be taking all potential issues through the list first, even if the resolution eventually takes place off-list. Even if the feature author is doing all of the talking, this makes any new knowledge public, and allows other people to help out if they can. Most people already do this, but I would suggest again that we ask that any requests for help we receive directly be resent to the user list.
On the community, there clearly needs to be a balance between what users expect to get from the code and what the developers are obligated to provide. With exceptions, a vast majority of those contributing code are not doing so in their spare time. Much of this code is related to their own work, but a nonzero amount is stuff that simply needs to get done. Users of the code need to recognize this. However, at the same time, we as developers need to hold ourselves to some standards, namely, if we say the code does something, it better do it. Clearly, there are situations where we can not deliver on this, at least not right away. In general though, we need to be clear about what the code will and will not do, and see that our statements are and remain true.
I think we should consider setting up a wish list, where users can submit ideas for features that they would like see added to the code. This should be viewable by everyone. I think this might help people stay conscious of the fact that if they want something, that means someone has to physically go and do it for them. Maybe this will even give people the notion that they can do it on their own.
Britton
On Thu, Aug 19, 2010 at 11:54 PM, Matthew Turk
wrote: Hi all,
I'm going to top post, which I guess I do more than I ought to anyway, because I'm going to try to address a number of issues that have been brought up. I've spent some of the day thinking about this issue, and what it says about yt as a community and about my level of involvement in various areas.
So, I'll touch on those at the end, but first I'll hit back on the issue of parallelism and how to address it.
= Parallelism =
I think what is becoming clear is that the step from serial to parallel, in terms of user experience, should be more well-handled than it currently is. As it stands, the section in the manual that covers parallelism basically says, "These things work, go ahead and give it a go!" This is my fault, and it's not really sufficient. More detail has to be given, and rather than a whitelist of actions that are parallel safe we need to also include a *blacklist*.
The second step we need to take is provide examples of how to submit a parallel job -- how much it requires in terms of resources and so on. Unfortunately, it's not entirely clear to me the best way to organize the documentation, and I don't even really know where this would go. Stephen did a really rad job of doing this in the halo finding paper, and he's done an excellent job with his work on the halo finder as a whole. (It's just that last 5% toward the user experience, I think. :) My own work on the parallel projections should be better documented and the UX there should be improved as well.
The third is to keep an eye on memory usage. Memory profiling is difficult, but it's something we have tried before and that I believe needs to be re-examined. Specifically, it seems that both projections and the parallel halo finder suffer from this problem. As a note, next week I will be spending some time swapping out the old projection method for the new quad-tree method. This should improve both speed and memory usage.
Okay, on to the larger problems that I think this relates to.
= Bugs =
First off, we need a mechanism for handling and bugs. I don't want to use the word "triage" here, but it is becoming clear that we need a mechanism. Currently, we have a Trac site that really doesn't get used at all. I've explored a couple mechanisms for encouraging bug reports.
* I can enable OpenID login -- this means using something like your GoogleName to log in and report a bug. * I've already replicated the .htpasswd between mercurial and the Trac site, so anyone who has a report there can log in to the Trac site. * yt could register a default excepthook that encourages the user to report a bug. I'm leery of this because I'm not sure I want to muck about with Python internals that much, but it could be done nicely, I think.
Overall, though, what really needs to happen is some kind of *buy-in* on the part of the user -- which in this case is anyone who has had trouble with yt. I have pulled back from yt-users, and I'm really happy that everyone else has stepped up. But I'm worried that as time goes on, people will pick up knowledge in ways that aren't indexable by search engines and then this knowledge keeps getting re-learned.
Public reporting of bugs, particularly as it could relate to improvements in documentation, is essential. But this can't happen if it's just driven by one or two people. And if no one else is motivated to encourage this, then perhaps that's just where we'll stay. I can't force buy-in, I can only encourage people to see the benefits to reporting bugs, sharing experiences, and all of that. We need to have people to read and handle bugs, and then people to whom they apply. I really would like for this not always to be me.
Anyway, if you have an hg account, you can login:
http://yt.enzotools.org/login/
and then report a bug:
http://yt.enzotools.org/newticket
It helps if you paste the traceback with --paste on the command line of your script.
= Fixing Bugs =
We've had great success with people taking ownership of different bugs on the mailing list and fixing them. This is a huge success story, and I thank all the developers that have made this happen. But I think it's important that we continue to develop this sense of ownership through the Trac site.
= Major Enhancements =
Adding on major enhancements is unfortunately an open problem. For instance, I would really like to see the parallelism framework essentially rewritten to be more modular and to take advantage of nested MPI communicators. I have a sketch of how this would go, and I've even written some code. But, I'm not employed to work on yt. I mostly develop it either as it suits my research interests (and I am operating under the working assumption that this is true for everyone else) or as I find it something fun to do in the evenings. I want it to be used, and to be useful, and I believe that my stewardship of the project up to this point supports this conclusion.
I *truly* do believe in cross-code simulation analysis, sharing facilities with other users, and reproducible research. But I am reaching the limits of what I, alone, can do. So far we've had some pretty major contributions from a number of developers, but I think it's important that we communicate to the community that this is still a volunteer project.
We don't have a team of dedicated software developers, we have a handful of scientists who are working to both further their own research interests while providing the best user experience possible for an advanced analysis code. And, to be perfectly frank, I think we're doing a pretty darn good job on both of those fronts. Many people now use yt on a daily basis to analyze simulation outputs from several different codes. We've got advanced analysis and viz functionality, thanks to *you* developers, that has been published a dozen papers, been shown at the Adler Planetarium, taken home the third place at the SciDAC visualization "Oscars," and even (ever so briefly) been on the Discovery channel.
But, still, we have to keep our eye on the prize. And if the prize the other developers have *their* eye on isn't the prize *you* have your eye on, unfortunately some responsibility will fall back on to your shoulders. I honestly wish I could spend more time helping others use yt, developing yt, and building it to be the tool I really wish it would be. Don't think that I don't see all the warts and problems that you all see -- I do. In the docs, the source code, the functionality, the user experience ... I see the warts too.
But even though developing yt is fun, I'm still developing it because I'm a scientist who wants to ask questions of his data.
= Building Community =
We've done a good job of this, but it's becoming clear that there's a disjoint:
* We're doing a mediocre job of shepherding users into being contributing developers. I'd like to help fix this by writing up more suggestions on how to develop and share your changes. yt will stagnate if we don't continue to churn the developer list. * We need to articulate the vision for yt, and I'm not sure my vision is the one anyone else has.
I'd love to hear suggestions about this aspect.
= Documentation =
Any help anyone can give with documentation would be great. Organization, notes, suggestions, anything. Report it as a bug. Commit changes. Email the list.
==
Anyway, that's basically what I've been thinking about, and what I wanted to say. I think we have an opportunity with yt to build a real community of collaboration and sharing of resources. And we've done a great job with that so far. But it still has to be something of a jumpstart approach -- jumpstarting development and then encouraging others to pick up the torch and run with it. Grass roots, science-driven development is kind of the name of the game here.
And when there *are* problems, I'm sure that lots of people are eager to jump at helping you fix them. But we have to hear about 'em before we can. :)
Thanks,
Matt
On Thu, Aug 19, 2010 at 1:12 PM, Stephen Skory
wrote: Hi Brian & Eric,
As you know (since we discussed it off-list), I'm the reason for this being mentioned to you. I had some pretty horrible problems with the various incarnations of HOP in yt being excruciatingly slow and consuming huge amounts of memory for a 1024^3 unigrid dataset, to the point where my grad student and I
ended up just using P-GroupFinder, the standalone halo finder that comes with week-of-code enzo. Note that when I say "excruciatingly slow" and "consuming huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2 cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out
of memory, or, alternately, didn't finish in 24 hours.
A few notes in response:
- Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours which averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8 the size and should have run, and I don't know why it didn't.
- If I wasn't trying to graduate I would have had more time to assist when your student (Brian) asked me for help. I'm sorry so much of your time was wasted.
- My tool as a public tool is not any good unless other people can use it too. Clearly I need to do some work on that.
- It *does* use much more memory than it needs to, you are right. I know where the problems are, and whoo-boy they are there, but they are not easy to fix.
- Speed could be better, but some of this has to do with how HOP itself works. For example, it needs to run the kD tree twice, unlike FOF which needs to only once. The final group building step is a "global" operation, so that's slow as well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF about 25. The C HOP and FOF in yt both use the same kD tree, same data I/O methods, so that's a fair ratio of the increased workload.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Britton, Thanks for your thoughtful reply. I'm going to address some of the technical aspects.
I will mostly stay out of the parallelism issue, but I'll only add that I have been doing projections of 1024^3 unigrid data on kraken with 64 cores. They have gone fine for me, taking roughly 10 seconds or so each. I also think an explicit list of actions that do not run in parallel is a really good idea.
I've assigned a ticket about the resource requirements and I'll handle the "blacklist" of parallel actions. For the issue of projection speed, I've never had trouble, but I also mainly use yt for projection non-unigrids. I believe yt scales well to unigrids, but it is with the fixed resolution projections (which are poorly documented) that a number of shortcuts are applied that would help with unigrid projections. For AMR projections I would put yt up against any other game in town. As of the end of next week, when I finally have time to do the QuadTree projections I wrote six months ago, we should get an order of magnitude speedup.
On the bugs issue, it is not clear to me how a new user can tell the difference between a bug and simply doing something wrong. Either way, I think that the users list is succeeding in getting people's issues solved, provided that the issues make it there. I think we really need to encourage all users to be taking all potential issues through the list first, even if the resolution eventually takes place off-list. Even if the feature author is doing all of the talking, this makes any new knowledge public, and allows other people to help out if they can. Most people already do this, but I would suggest again that we ask that any requests for help we receive directly be resent to the user list.
I agree with this. I think we should start encouraging *very* strongly that all questions and problems with the software be discussed on the list. I'm a leaky spot in the pipeline, I do confess, but I'll try my best to redirect questions I get off list back to on-list. For what it's worth (and I'll address this in my reply to Jeff) I've taken the time to clean up and add some niceties to Trac. This includes OpenID authentication and a link to reporting a bugs. In the other email I'll expand a bit.
On the community, there clearly needs to be a balance between what users expect to get from the code and what the developers are obligated to provide. With exceptions, a vast majority of those contributing code are not doing so in their spare time. Much of this code is related to their own work, but a nonzero amount is stuff that simply needs to get done. Users of the code need to recognize this. However, at the same time, we as developers need to hold ourselves to some standards, namely, if we say the code does something, it better do it. Clearly, there are situations where we can not deliver on this, at least not right away. In general though, we need to be clear about what the code will and will not do, and see that our statements are and remain true.
I agree with this. It's a tough balance, between doing what you need to do and trying to share it, but also making sure it is useful to other people. I think you've summed that up nicely. Recently I added a level-of-support grid for astro codes, of features versus codes. Maybe we should consider adding coarse features and a level-of-reliability estimate.
I think we should consider setting up a wish list, where users can submit ideas for features that they would like see added to the code. This should be viewable by everyone. I think this might help people stay conscious of the fact that if they want something, that means someone has to physically go and do it for them. Maybe this will even give people the notion that they can do it on their own.
That would be excellent. I've set up a new page: http://yt.enzotools.org/wiki/WishList that inclues my previous list of project ideas from the GettingInvolved page and adds on a query to find all the open tickets. Anyone that validates with an OpenID (this may be a mistake ...) can edit the page and add new wishlist items. However, I've also mirrored the hg passwords, and I'd prefer if developers logged in with those. Again, thanks for your thoughtful reply. -Matt
Britton
On Thu, Aug 19, 2010 at 11:54 PM, Matthew Turk
wrote: Hi all,
I'm going to top post, which I guess I do more than I ought to anyway, because I'm going to try to address a number of issues that have been brought up. I've spent some of the day thinking about this issue, and what it says about yt as a community and about my level of involvement in various areas.
So, I'll touch on those at the end, but first I'll hit back on the issue of parallelism and how to address it.
= Parallelism =
I think what is becoming clear is that the step from serial to parallel, in terms of user experience, should be more well-handled than it currently is. As it stands, the section in the manual that covers parallelism basically says, "These things work, go ahead and give it a go!" This is my fault, and it's not really sufficient. More detail has to be given, and rather than a whitelist of actions that are parallel safe we need to also include a *blacklist*.
The second step we need to take is provide examples of how to submit a parallel job -- how much it requires in terms of resources and so on. Unfortunately, it's not entirely clear to me the best way to organize the documentation, and I don't even really know where this would go. Stephen did a really rad job of doing this in the halo finding paper, and he's done an excellent job with his work on the halo finder as a whole. (It's just that last 5% toward the user experience, I think. :) My own work on the parallel projections should be better documented and the UX there should be improved as well.
The third is to keep an eye on memory usage. Memory profiling is difficult, but it's something we have tried before and that I believe needs to be re-examined. Specifically, it seems that both projections and the parallel halo finder suffer from this problem. As a note, next week I will be spending some time swapping out the old projection method for the new quad-tree method. This should improve both speed and memory usage.
Okay, on to the larger problems that I think this relates to.
= Bugs =
First off, we need a mechanism for handling and bugs. I don't want to use the word "triage" here, but it is becoming clear that we need a mechanism. Currently, we have a Trac site that really doesn't get used at all. I've explored a couple mechanisms for encouraging bug reports.
* I can enable OpenID login -- this means using something like your GoogleName to log in and report a bug. * I've already replicated the .htpasswd between mercurial and the Trac site, so anyone who has a report there can log in to the Trac site. * yt could register a default excepthook that encourages the user to report a bug. I'm leery of this because I'm not sure I want to muck about with Python internals that much, but it could be done nicely, I think.
Overall, though, what really needs to happen is some kind of *buy-in* on the part of the user -- which in this case is anyone who has had trouble with yt. I have pulled back from yt-users, and I'm really happy that everyone else has stepped up. But I'm worried that as time goes on, people will pick up knowledge in ways that aren't indexable by search engines and then this knowledge keeps getting re-learned.
Public reporting of bugs, particularly as it could relate to improvements in documentation, is essential. But this can't happen if it's just driven by one or two people. And if no one else is motivated to encourage this, then perhaps that's just where we'll stay. I can't force buy-in, I can only encourage people to see the benefits to reporting bugs, sharing experiences, and all of that. We need to have people to read and handle bugs, and then people to whom they apply. I really would like for this not always to be me.
Anyway, if you have an hg account, you can login:
http://yt.enzotools.org/login/
and then report a bug:
http://yt.enzotools.org/newticket
It helps if you paste the traceback with --paste on the command line of your script.
= Fixing Bugs =
We've had great success with people taking ownership of different bugs on the mailing list and fixing them. This is a huge success story, and I thank all the developers that have made this happen. But I think it's important that we continue to develop this sense of ownership through the Trac site.
= Major Enhancements =
Adding on major enhancements is unfortunately an open problem. For instance, I would really like to see the parallelism framework essentially rewritten to be more modular and to take advantage of nested MPI communicators. I have a sketch of how this would go, and I've even written some code. But, I'm not employed to work on yt. I mostly develop it either as it suits my research interests (and I am operating under the working assumption that this is true for everyone else) or as I find it something fun to do in the evenings. I want it to be used, and to be useful, and I believe that my stewardship of the project up to this point supports this conclusion.
I *truly* do believe in cross-code simulation analysis, sharing facilities with other users, and reproducible research. But I am reaching the limits of what I, alone, can do. So far we've had some pretty major contributions from a number of developers, but I think it's important that we communicate to the community that this is still a volunteer project.
We don't have a team of dedicated software developers, we have a handful of scientists who are working to both further their own research interests while providing the best user experience possible for an advanced analysis code. And, to be perfectly frank, I think we're doing a pretty darn good job on both of those fronts. Many people now use yt on a daily basis to analyze simulation outputs from several different codes. We've got advanced analysis and viz functionality, thanks to *you* developers, that has been published a dozen papers, been shown at the Adler Planetarium, taken home the third place at the SciDAC visualization "Oscars," and even (ever so briefly) been on the Discovery channel.
But, still, we have to keep our eye on the prize. And if the prize the other developers have *their* eye on isn't the prize *you* have your eye on, unfortunately some responsibility will fall back on to your shoulders. I honestly wish I could spend more time helping others use yt, developing yt, and building it to be the tool I really wish it would be. Don't think that I don't see all the warts and problems that you all see -- I do. In the docs, the source code, the functionality, the user experience ... I see the warts too.
But even though developing yt is fun, I'm still developing it because I'm a scientist who wants to ask questions of his data.
= Building Community =
We've done a good job of this, but it's becoming clear that there's a disjoint:
* We're doing a mediocre job of shepherding users into being contributing developers. I'd like to help fix this by writing up more suggestions on how to develop and share your changes. yt will stagnate if we don't continue to churn the developer list. * We need to articulate the vision for yt, and I'm not sure my vision is the one anyone else has.
I'd love to hear suggestions about this aspect.
= Documentation =
Any help anyone can give with documentation would be great. Organization, notes, suggestions, anything. Report it as a bug. Commit changes. Email the list.
==
Anyway, that's basically what I've been thinking about, and what I wanted to say. I think we have an opportunity with yt to build a real community of collaboration and sharing of resources. And we've done a great job with that so far. But it still has to be something of a jumpstart approach -- jumpstarting development and then encouraging others to pick up the torch and run with it. Grass roots, science-driven development is kind of the name of the game here.
And when there *are* problems, I'm sure that lots of people are eager to jump at helping you fix them. But we have to hear about 'em before we can. :)
Thanks,
Matt
On Thu, Aug 19, 2010 at 1:12 PM, Stephen Skory
wrote: Hi Brian & Eric,
As you know (since we discussed it off-list), I'm the reason for this being mentioned to you. I had some pretty horrible problems with the various incarnations of HOP in yt being excruciatingly slow and consuming huge amounts of memory for a 1024^3 unigrid dataset, to the point where my grad student and I
ended up just using P-GroupFinder, the standalone halo finder that comes with week-of-code enzo. Note that when I say "excruciatingly slow" and "consuming huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2 cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out
of memory, or, alternately, didn't finish in 24 hours.
A few notes in response:
- Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours which averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8 the size and should have run, and I don't know why it didn't.
- If I wasn't trying to graduate I would have had more time to assist when your student (Brian) asked me for help. I'm sorry so much of your time was wasted.
- My tool as a public tool is not any good unless other people can use it too. Clearly I need to do some work on that.
- It *does* use much more memory than it needs to, you are right. I know where the problems are, and whoo-boy they are there, but they are not easy to fix.
- Speed could be better, but some of this has to do with how HOP itself works. For example, it needs to run the kD tree twice, unlike FOF which needs to only once. The final group building step is a "global" operation, so that's slow as well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF about 25. The C HOP and FOF in yt both use the same kD tree, same data I/O methods, so that's a fair ratio of the increased workload.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi All, I have a few brief thoughts to add.
Overall, though, what really needs to happen is some kind of *buy-in* on the part of the user -- which in this case is anyone who has had trouble with yt. I have pulled back from yt-users, and I'm really happy that everyone else has stepped up. But I'm worried that as time goes on, people will pick up knowledge in ways that aren't indexable by search engines and then this knowledge keeps getting re-learned.
I'm 100% in agreement here about increasing reporting of bugs. I myself am quite guilty of not formally reporting bugs. One thing we might do is to attempt to force yt to ask people to report bugs when it crashes. I'm not sure how easy this might be, but I often report bugs when software I use explicitly tells me to. On the other hand, I think we need to make it as easy as possible for people to report bugs. OpenID is a great idea, and would definitely help. I personally recieve a few bug reports from users privately every now and again. These bugs easily fall into two categories: ones I fix right away, and those that fall into pergatory. I would love to simply say "report the bug on trac, please". Of course, I acknowledge my own fault in not doing myself when handed these bugs.
Public reporting of bugs, particularly as it could relate to improvements in documentation, is essential. But this can't happen if it's just driven by one or two people. And if no one else is motivated to encourage this, then perhaps that's just where we'll stay. I can't force buy-in, I can only encourage people to see the benefits to reporting bugs, sharing experiences, and all of that. We need to have people to read and handle bugs, and then people to whom they apply. I really would like for this not always to be me.
This is a great idea. Can we have trac email us our bugs, or let us know when a bug is assigned to us, or when a new bug comes in? This way we, the non-Matt yt developing community, can assign incoming bugs? I apologize if these are simple things that are already well known, but I'd like to make sure we can make it as easy as possible to have the entire yt development team responding to and categorizing bugs in addition to fixing them.
= Building Community =
* We need to articulate the vision for yt, and I'm not sure my vision is the one anyone else has.
I think we need to create a roadmap for future versions of yt. I think we need to think seriously about what needs rewriting in the internals, what features we want to add, and how long we want to allot ourselves to achieve those goals. Furthermore, we need to explicitly and publically commit to doing those things we're interested in. Any unassigned features/rewrites would then be added to a wishlist, which we could advertise to new users as a way to get in the door. One uphill battle is that most users of yt are astronomers, who do not necessarily have much exposure to free software nor do they necessarily have a lot of desire to get really good at software development. This is very sad, of course, because being a bit better at software development is something that would help anyone using any large AMR simulation codebase. I also agree that documentation features should be classified as bugs. Perhaps we should poll yt-users fairly frequently (once or twice a year) in order to solicit documentation bugs/feature requests. thanks, and many thanks to Matt for the amazing amount of work he's put in to yt thus far. J.S.
Hi Jeff,
Thanks for your thoughtful reply.
On Fri, Aug 20, 2010 at 3:01 PM, j s oishi
Hi All,
I have a few brief thoughts to add.
Overall, though, what really needs to happen is some kind of *buy-in* on the part of the user -- which in this case is anyone who has had trouble with yt. I have pulled back from yt-users, and I'm really happy that everyone else has stepped up. But I'm worried that as time goes on, people will pick up knowledge in ways that aren't indexable by search engines and then this knowledge keeps getting re-learned.
I'm 100% in agreement here about increasing reporting of bugs. I myself am quite guilty of not formally reporting bugs. One thing we might do is to attempt to force yt to ask people to report bugs when it crashes. I'm not sure how easy this might be, but I often report bugs when software I use explicitly tells me to.
After I read your email, I set up a quick demo of how this might go. I've placed it here: http://paste.enzotools.org/show/1107/ What this does is, unless asked not to or if it detects someone else has already done so, register an excepthook. (We have two mechanisms for doing this in place in yt already: --paste, --rpdb, and today I added two more of --detailed and --paste-detailed, which print out some better contextual information.) If an exception is handled by the excepthook, the code will print the exception and ask if the traceback should be submitted to the pastebin. It times out after ten seconds. This is an option. In fact, we could include a little blurb about how to report it as a bug. What does everyone thing? I'm sort of on the fence about this mechanism.
On the other hand, I think we need to make it as easy as possible for people to report bugs. OpenID is a great idea, and would definitely help. I personally recieve a few bug reports from users privately every now and again. These bugs easily fall into two categories: ones I fix right away, and those that fall into pergatory. I would love to simply say "report the bug on trac, please". Of course, I acknowledge my own fault in not doing myself when handed these bugs.
I have added OpenID support. It's a bit glitchy, in that I can't quite decipher how to make Trac only show one login or one login-name at a time. I'd prefer if developers logged in here: http://yt.enzotools.org/login but it's been tricky to get a unified interface going. Anyone who logs in with OpenID is now able to report bugs. Furthermore, I've added links to bug reporting on the main pages. And, finally, I've set it such that all changes to tickets get emailed to yt-svn. This should help to make sure we don't miss anything.
This is a great idea. Can we have trac email us our bugs, or let us know when a bug is assigned to us, or when a new bug comes in? This way we, the non-Matt yt developing community, can assign incoming bugs? I apologize if these are simple things that are already well known, but I'd like to make sure we can make it as easy as possible to have the entire yt development team responding to and categorizing bugs in addition to fixing them.
Yup, I have set it to email yt-svn. I think this is the best wya of handling it. I've also re-worked the components for tickets a bit, so that instead of module names they are coarse names of regions in the code: documentation, enzo, orion, cookbook, yt, halo_finding ...
I think we need to create a roadmap for future versions of yt. I think we need to think seriously about what needs rewriting in the internals, what features we want to add, and how long we want to allot ourselves to achieve those goals. Furthermore, we need to explicitly and publically commit to doing those things we're interested in. Any unassigned features/rewrites would then be added to a wishlist, which we could advertise to new users as a way to get in the door.
This is the best articulation of a gameplan I've heard yet, and I like it. I nominate you to spearhead this, but I think a roadmap will probably write itself ...
One uphill battle is that most users of yt are astronomers, who do not necessarily have much exposure to free software nor do they necessarily have a lot of desire to get really good at software development. This is very sad, of course, because being a bit better at software development is something that would help anyone using any large AMR simulation codebase.
Both of these are true. I think that the lack of exposure to Free Software is damaging, because I think that for myself (and I know you feel similarly) my feelings on collaboration and communication have been shaped by my exposure to the ideals of the free and open source software movement, and I think they have been beneficial to me.
I also agree that documentation features should be classified as bugs. Perhaps we should poll yt-users fairly frequently (once or twice a year) in order to solicit documentation bugs/feature requests.
Excellent idea, and I like this form submit mechanism -- it's gotten very nice uptake.
thanks, and many thanks to Matt for the amazing amount of work he's put in to yt thus far.
Thank *you*. Best, Matt
J.S. _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (4)
-
Brian O'Shea
-
Britton Smith
-
j s oishi
-
Matthew Turk