This is already a great discussion, and it seems to me that there are a lot of great ideas that I only want to echo. First, I think what Matt started with is a great foundation: yt should continue to be a collaborative, open-source tool for reproducible physical analysis of simulation data. I think the idea that not only is yt itself open source, but the *entire software stack upon which it rests* is also open source is a point worth emphasizing. If nothing else, a user gets not only yt but also an amazing toolkit for doing numerical computation free of cost and restrictive licenses.
Second, I think that Brian's point about data provenance and reproducibility of an entire project is really a direction I would love to see yt move in. yt should allow (and encourage!) reproducibility beyond analysis to include simulation initialization, runtime, and final, reduced data products. Furthermore, I believe it should be able to do this in a cross-code manner: imagine having a set of descriptions (perhaps in the form of yt scripts, perhaps in some other machine/human readable format) that describe initial conditions, runtime parameters, analysis outputs and data products that could be run on Enzo and Ramses. We could move beyond code comparison test problems to real inter-code reproducibility.
Finally, I think that Britton is right that we should also continue to emphasize that yt is a tool for physical reasoning on simulation data, and that *it*, not *you the user* make all necessary manipulations to get simulation data to physical quantities.
Thanks again for starting such an interesting discussion. I look forward to moving forward with yt.
On Mon, Jun 13, 2011 at 4:23 PM, Brian O'Shea email@example.com wrote:
This may not be something specifically for the mission statement (depending on how wordy we want to get), but I'm very interested in using yt (or something that encompasses yt) as a workflow tool so that my simulations are completely reproducible. What I could imagine is something like this:
1. Generate initial conditions, cosmological or otherwise. IC parameter file goes into a database, along with details about the code that's used to generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in a database. (what supercomputer? How many CPUs? Environmental variables? Which version of MPI? What date(s) did the job run on? What nodes? Copy of Enzo restart parameter files and perhaps hierarchy files, for later query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data, depending on how big the sim is). What directory is it in? Should it be world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I can go back and retrace all details.
At that point, I would know _precisely_ how and with what commands/code/parameter files/etc. the plots that are in my papers, and everything leading up to that, is generated. This helps when you go back to deal with the referee ("how DID I make that stupid plot? What was sigma8 again?"), but also for reproducibility, since in principle somebody could just go back, look at the database, and be able to do precisely what I did. Also, if somebody wanted to use archival data - something we hope to do more of in the future, as simulations grow in expense and complexity - there'd be no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for easily and transparently tracking all aspects of simulation generation, execution, and analysis for the purposes of reproducibility."
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow tools that I have been exposed to suffer from excessive generality, and thus are a bit too cumbersome to be easy to use, and thus too cumbersome to be actually used.
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk firstname.lastname@example.org wrote:
I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly appreciated, particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have.
A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ .
“Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.”
To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago:
As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed.
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code
= What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis & optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers
= What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion
What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community.
Thanks very much for any thoughts,
Matt _______________________________________________ Yt-dev mailing list Ytemail@example.com http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Yt-dev mailing list Ytfirstname.lastname@example.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org