A Mission Statement for yt
Hi everyone, I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly appreciated, particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have. A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ . “Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.” To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago: http://yt.enzotools.org/principles.html As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed. = What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code = What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis & optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers = What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community. Thanks very much for any thoughts, Matt
Hi Matt,
This may not be something specifically for the mission statement (depending
on how wordy we want to get), but I'm very interested in using yt (or
something that encompasses yt) as a workflow tool so that my simulations are
completely reproducible. What I could imagine is something like this:
1. Generate initial conditions, cosmological or otherwise. IC parameter
file goes into a database, along with details about the code that's used to
generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and
make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in
a database. (what supercomputer? How many CPUs? Environmental variables?
Which version of MPI? What date(s) did the job run on? What nodes? Copy
of Enzo restart parameter files and perhaps hierarchy files, for later
query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data,
depending on how big the sim is). What directory is it in? Should it be
world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I
can go back and retrace all details.
At that point, I would know _precisely_ how and with what
commands/code/parameter files/etc. the plots that are in my papers, and
everything leading up to that, is generated. This helps when you go back to
deal with the referee ("how DID I make that stupid plot? What was sigma8
again?"), but also for reproducibility, since in principle somebody could
just go back, look at the database, and be able to do precisely what I did.
Also, if somebody wanted to use archival data - something we hope to do more
of in the future, as simulations grow in expense and complexity - there'd be
no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for
easily and transparently tracking all aspects of simulation generation,
execution, and analysis for the purposes of reproducibility."
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow
tools that I have been exposed to suffer from excessive generality, and thus
are a bit too cumbersome to be easy to use, and thus too cumbersome to be
actually used.
--Brian
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk
Hi everyone,
I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly appreciated, particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have.
A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ .
“Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.”
To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago:
http://yt.enzotools.org/principles.html
As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed.
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code
= What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis & optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers
= What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion
What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community.
Thanks very much for any thoughts,
Matt _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi all,
This is already a great discussion, and it seems to me that there are
a lot of great ideas that I only want to echo. First, I think what
Matt started with is a great foundation: yt should continue to be a
collaborative, open-source tool for reproducible physical analysis of
simulation data. I think the idea that not only is yt itself open
source, but the *entire software stack upon which it rests* is also
open source is a point worth emphasizing. If nothing else, a user gets
not only yt but also an amazing toolkit for doing numerical
computation free of cost and restrictive licenses.
Second, I think that Brian's point about data provenance and
reproducibility of an entire project is really a direction I would
love to see yt move in. yt should allow (and encourage!)
reproducibility beyond analysis to include simulation initialization,
runtime, and final, reduced data products. Furthermore, I believe it
should be able to do this in a cross-code manner: imagine having a set
of descriptions (perhaps in the form of yt scripts, perhaps in some
other machine/human readable format) that describe initial conditions,
runtime parameters, analysis outputs and data products that could be
run on Enzo and Ramses. We could move beyond code comparison test
problems to real inter-code reproducibility.
Finally, I think that Britton is right that we should also continue to
emphasize that yt is a tool for physical reasoning on simulation data,
and that *it*, not *you the user* make all necessary manipulations to
get simulation data to physical quantities.
Thanks again for starting such an interesting discussion. I look
forward to moving forward with yt.
j
On Mon, Jun 13, 2011 at 4:23 PM, Brian O'Shea
Hi Matt,
This may not be something specifically for the mission statement (depending on how wordy we want to get), but I'm very interested in using yt (or something that encompasses yt) as a workflow tool so that my simulations are completely reproducible. What I could imagine is something like this:
1. Generate initial conditions, cosmological or otherwise. IC parameter file goes into a database, along with details about the code that's used to generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in a database. (what supercomputer? How many CPUs? Environmental variables? Which version of MPI? What date(s) did the job run on? What nodes? Copy of Enzo restart parameter files and perhaps hierarchy files, for later query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data, depending on how big the sim is). What directory is it in? Should it be world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I can go back and retrace all details.
At that point, I would know _precisely_ how and with what commands/code/parameter files/etc. the plots that are in my papers, and everything leading up to that, is generated. This helps when you go back to deal with the referee ("how DID I make that stupid plot? What was sigma8 again?"), but also for reproducibility, since in principle somebody could just go back, look at the database, and be able to do precisely what I did. Also, if somebody wanted to use archival data - something we hope to do more of in the future, as simulations grow in expense and complexity - there'd be no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for easily and transparently tracking all aspects of simulation generation, execution, and analysis for the purposes of reproducibility."
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow tools that I have been exposed to suffer from excessive generality, and thus are a bit too cumbersome to be easy to use, and thus too cumbersome to be actually used.
--Brian
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk
wrote: Hi everyone,
I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly appreciated, particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have.
A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ .
“Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.”
To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago:
http://yt.enzotools.org/principles.html
As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed.
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code
= What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis & optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers
= What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion
What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community.
Thanks very much for any thoughts,
Matt _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi all,
tl;dr summary: New bullet points below, along with a first draft at a
proper prose solidification. More comments still requested.
Thanks everyone for your thoughtful responses. Having this discussion
as a group is really the only way to have the outcome of it be
meaningful; I'm glad we could have as much of a discussion as we
already have, and I hope that moving forward we can keep talking about
this and refining it to sort of steer ourselves.
In reading over your replies, it's become clear that the mission goal
bullet points in that original list were a bit ... well, shall we say,
under-ambitious? So let's take the gloves off a bit. Chris, your
comments in particular made me realize that my own feelings about this
project we've got going are a bit more ambitious; Brian, Britton and
Jeff, yours did as well. And this didn't come through in the bullet
points, although it was alluded to. I'll put my comments at the
bottom, along with an updated list of bullet points, after I respond
to a couple things that were brought up.
On Mon, Jun 13, 2011 at 5:06 PM, j s oishi
Hi all,
This is already a great discussion, and it seems to me that there are a lot of great ideas that I only want to echo. First, I think what Matt started with is a great foundation: yt should continue to be a collaborative, open-source tool for reproducible physical analysis of simulation data. I think the idea that not only is yt itself open source, but the *entire software stack upon which it rests* is also open source is a point worth emphasizing. If nothing else, a user gets not only yt but also an amazing toolkit for doing numerical computation free of cost and restrictive licenses.
That is a very, very good point -- and the impetus for its initial creation, actually.
Second, I think that Brian's point about data provenance and reproducibility of an entire project is really a direction I would love to see yt move in. yt should allow (and encourage!) reproducibility beyond analysis to include simulation initialization, runtime, and final, reduced data products. Furthermore, I believe it should be able to do this in a cross-code manner: imagine having a set of descriptions (perhaps in the form of yt scripts, perhaps in some other machine/human readable format) that describe initial conditions, runtime parameters, analysis outputs and data products that could be run on Enzo and Ramses. We could move beyond code comparison test problems to real inter-code reproducibility.
Yes. Yes, and more yes; I firmly believe in this. Looking, realistically, at where we are and where we are going, I believe this is an utterly feasible goal, and the timescale is not terribly great -- effort simply needs to be applied in that direction. We can have a longer discussion about this, but I think having this item in the mission statement for now is sufficient.
Finally, I think that Britton is right that we should also continue to emphasize that yt is a tool for physical reasoning on simulation data, and that *it*, not *you the user* make all necessary manipulations to get simulation data to physical quantities.
Thanks again for starting such an interesting discussion. I look forward to moving forward with yt.
j
On Mon, Jun 13, 2011 at 4:23 PM, Brian O'Shea
wrote: Hi Matt,
This may not be something specifically for the mission statement (depending on how wordy we want to get), but I'm very interested in using yt (or something that encompasses yt) as a workflow tool so that my simulations are completely reproducible. What I could imagine is something like this:
I very much like the workflow you laid out, although I would contend we should address more directly the task of running the simulation. On some level, it becomes a realizable goal to execute the main loop of the code in Python without any real overhead. This will have the side effect of providing much easier access to the data during the course of the simulation. I would also scratch out "Enzo" and replace it with "Simulation Code" -- while pragmatically I recognize your simulations will likely be conducted in Enzo for the purposes of this provenance tracking, I feel it should be said that for the mission statement I believe in a code-neutral direction. One difficulty here is the idea of actually moving the data. It is not clear the me that moving data around in file systems is a tractable, solvable problem. That is a good thing to strive for, but I personally can't wrap my head around it. Stephen? Britton?
1. Generate initial conditions, cosmological or otherwise. IC parameter file goes into a database, along with details about the code that's used to generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in a database. (what supercomputer? How many CPUs? Environmental variables? Which version of MPI? What date(s) did the job run on? What nodes? Copy of Enzo restart parameter files and perhaps hierarchy files, for later query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data, depending on how big the sim is). What directory is it in? Should it be world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I can go back and retrace all details.
At that point, I would know _precisely_ how and with what commands/code/parameter files/etc. the plots that are in my papers, and everything leading up to that, is generated. This helps when you go back to deal with the referee ("how DID I make that stupid plot? What was sigma8 again?"), but also for reproducibility, since in principle somebody could just go back, look at the database, and be able to do precisely what I did. Also, if somebody wanted to use archival data - something we hope to do more of in the future, as simulations grow in expense and complexity - there'd be no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for easily and transparently tracking all aspects of simulation generation, execution, and analysis for the purposes of reproducibility."
That's a great sentence. Britton: I like your additions very much. It had completely slipped my mind that one of the most useful features of yt is its physical and geometric object selection. Chris: I don't think you are stepping outside the scope of what we could generously call "The yt project" with what you mention. There are the technical goals, and the broader community goals. The goals of open science, reproducibility, and cultivating a community of scientists willing to share scripts, analysis routines, and even analysis modules are certainly part of what I think we are all striving for. And this goal isn't composed of just deploying infrastructure, rolling it out, but also providing a welcoming and friendly community of people willing to help. For instance, it's great that scripts written to generate phase plots from Orion outputs can be used nearly unmodified on Enzo outputs. (I remember when Jeff worked so hard to make this so -- http://www.flickr.com/photos/matthewturk/2598141965/ ) But even more than that, I think it's amazing that people are willing to share these scripts. So yes, let's bake that right into the mission statement. As for your comments about the microphysical solvers, believe me when I say they are not falling on deaf ears. Moving toward an open source, community-driven model for microphysical solvers is an issue near and dear to my own heart, having spent several years of my life writing a primordial chemistry solver. I believe there is a place for interfacing with that sort of project and endeavor inside yt -- in particular, interfacing with specific APIs and so forth to seamlessly calculate cooling times or EOS or opacities. Let's revisit this issue in the future. (Although, if we step back for a second and look at what's in yt ... boundary condition calculations, cooling time calculations, gravity, ... the mind does wander.) The revised bullet points I have: = What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization that allows one to ask astrophysical questions of simulation data independent of the code used to produce that data. * To create a friendly, helpful community of scientists * To further the goals of Open Science * To construct an environment that encompasses the generation of data, starting from initial conditions, through simulations, and finally resulting in publication-quality plots * To create reproducible, cross-code questions and answers from astrophysical data * To present simulation data in physical terms, rather than strictly in simulation and data format terms * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code * To provide a place to create and share analysis codes, recipes, and other things that can be helpful to others seeking to answer similar scientific questions. The next step in this is to try to distill it down into a sentence or two. I've included my first pass at this. Not all items have to be included -- they can be shuffled off and left implicit in the proper mission statement, but can show up in the broader directions. The ultimate goal of this is to provide both the short-form "elevator pitch" and then augment that with what we could generously call strategy documents. Draft 1: The yt project aims to produce an integrated science environment for asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, a friendly and helpful community of users and developers, and Free and Libre Open Source Software. I'm not terribly satisfied with this draft. I don't quite know how to work in two things that I think should be stated -- that the end goal is, ideally, a community project (whose bus factor is equal to the number of users :) and that we want to focus on the physical underpinnings of simulations when asking questions rather than, say, the specifics of unformatted fortran or HDF5. I think that the broader focus (as an integrated science environment) comes across, but the other core aspects are a bit underserved. Edits and suggestions? Thanks again, everyone. I'm glad we're having this conversation. -Matt
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow tools that I have been exposed to suffer from excessive generality, and thus are a bit too cumbersome to be easy to use, and thus too cumbersome to be actually used.
--Brian
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk
wrote: Hi everyone,
I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly appreciated, particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have.
A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ .
“Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.”
To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago:
http://yt.enzotools.org/principles.html
As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed.
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code
= What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis & optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers
= What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion
What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community.
Thanks very much for any thoughts,
Matt _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Everyone, I'm sorry it has taken me so long to join this conversation. I definitely agree with the things that are being said so far in terms of the long-term goals of yt. I might also add that because yt is being developed by/for a community of scientists, that we may want to mention something about collaboration in the statement. Tools like the pastebin, codr.cc, imgur, and soon Reason will allow real-time interactive collaborations which I think will really change the picture of computational science. I don't know what other people think, but I think this is a great step of progress for the community. In addition, yt has a growing user community, and I think one of the main reasons for that is our group of developers who are actively trying to make it approachable to the outside world (irc, mailing lists like this one, square-one documentation, etc.), and I think that is super-important. I know yt can do some super cool things, and we're planning on making it even cooler, but it's important to make it accessible to new to medium users, so that we continue to expand the user-community and this doesn't turn into something esoteric like IRAF or AIPS. What good are hyper-realistic, volume-rendered, 3d-projected wormhole movies, if no one else but the author can create them? Now of course, that is hyperbole and doesn't apply to the project at this point, but I think we should include a word or two about actively making the code usable and approachable to new users. I've made a new potential draft of the statement below. Feel free to modify/delete my changes. Good suggestions by everyone so far! Cameron Draft 2: The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. It will provide a standard framework based on physical quantities for interoperability between codes. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software. On 6/14/11 1:02 AM, Matthew Turk wrote:
Hi all,
tl;dr summary: New bullet points below, along with a first draft at a proper prose solidification. More comments still requested.
Thanks everyone for your thoughtful responses. Having this discussion as a group is really the only way to have the outcome of it be meaningful; I'm glad we could have as much of a discussion as we already have, and I hope that moving forward we can keep talking about this and refining it to sort of steer ourselves.
In reading over your replies, it's become clear that the mission goal bullet points in that original list were a bit ... well, shall we say, under-ambitious? So let's take the gloves off a bit. Chris, your comments in particular made me realize that my own feelings about this project we've got going are a bit more ambitious; Brian, Britton and Jeff, yours did as well. And this didn't come through in the bullet points, although it was alluded to. I'll put my comments at the bottom, along with an updated list of bullet points, after I respond to a couple things that were brought up.
On Mon, Jun 13, 2011 at 5:06 PM, j s oishi
wrote: Hi all,
This is already a great discussion, and it seems to me that there are a lot of great ideas that I only want to echo. First, I think what Matt started with is a great foundation: yt should continue to be a collaborative, open-source tool for reproducible physical analysis of simulation data. I think the idea that not only is yt itself open source, but the *entire software stack upon which it rests* is also open source is a point worth emphasizing. If nothing else, a user gets not only yt but also an amazing toolkit for doing numerical computation free of cost and restrictive licenses. That is a very, very good point -- and the impetus for its initial creation, actually.
Second, I think that Brian's point about data provenance and reproducibility of an entire project is really a direction I would love to see yt move in. yt should allow (and encourage!) reproducibility beyond analysis to include simulation initialization, runtime, and final, reduced data products. Furthermore, I believe it should be able to do this in a cross-code manner: imagine having a set of descriptions (perhaps in the form of yt scripts, perhaps in some other machine/human readable format) that describe initial conditions, runtime parameters, analysis outputs and data products that could be run on Enzo and Ramses. We could move beyond code comparison test problems to real inter-code reproducibility. Yes. Yes, and more yes; I firmly believe in this. Looking, realistically, at where we are and where we are going, I believe this is an utterly feasible goal, and the timescale is not terribly great -- effort simply needs to be applied in that direction.
We can have a longer discussion about this, but I think having this item in the mission statement for now is sufficient.
Finally, I think that Britton is right that we should also continue to emphasize that yt is a tool for physical reasoning on simulation data, and that *it*, not *you the user* make all necessary manipulations to get simulation data to physical quantities.
Thanks again for starting such an interesting discussion. I look forward to moving forward with yt.
j
On Mon, Jun 13, 2011 at 4:23 PM, Brian O'Shea
wrote: Hi Matt,
This may not be something specifically for the mission statement (depending on how wordy we want to get), but I'm very interested in using yt (or something that encompasses yt) as a workflow tool so that my simulations are completely reproducible. What I could imagine is something like this: I very much like the workflow you laid out, although I would contend we should address more directly the task of running the simulation. On some level, it becomes a realizable goal to execute the main loop of the code in Python without any real overhead. This will have the side effect of providing much easier access to the data during the course of the simulation.
I would also scratch out "Enzo" and replace it with "Simulation Code" -- while pragmatically I recognize your simulations will likely be conducted in Enzo for the purposes of this provenance tracking, I feel it should be said that for the mission statement I believe in a code-neutral direction.
One difficulty here is the idea of actually moving the data. It is not clear the me that moving data around in file systems is a tractable, solvable problem. That is a good thing to strive for, but I personally can't wrap my head around it. Stephen? Britton?
1. Generate initial conditions, cosmological or otherwise. IC parameter file goes into a database, along with details about the code that's used to generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in a database. (what supercomputer? How many CPUs? Environmental variables? Which version of MPI? What date(s) did the job run on? What nodes? Copy of Enzo restart parameter files and perhaps hierarchy files, for later query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data, depending on how big the sim is). What directory is it in? Should it be world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I can go back and retrace all details.
At that point, I would know _precisely_ how and with what commands/code/parameter files/etc. the plots that are in my papers, and everything leading up to that, is generated. This helps when you go back to deal with the referee ("how DID I make that stupid plot? What was sigma8 again?"), but also for reproducibility, since in principle somebody could just go back, look at the database, and be able to do precisely what I did. Also, if somebody wanted to use archival data - something we hope to do more of in the future, as simulations grow in expense and complexity - there'd be no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for easily and transparently tracking all aspects of simulation generation, execution, and analysis for the purposes of reproducibility." That's a great sentence.
Britton: I like your additions very much. It had completely slipped my mind that one of the most useful features of yt is its physical and geometric object selection.
Chris: I don't think you are stepping outside the scope of what we could generously call "The yt project" with what you mention. There are the technical goals, and the broader community goals. The goals of open science, reproducibility, and cultivating a community of scientists willing to share scripts, analysis routines, and even analysis modules are certainly part of what I think we are all striving for. And this goal isn't composed of just deploying infrastructure, rolling it out, but also providing a welcoming and friendly community of people willing to help. For instance, it's great that scripts written to generate phase plots from Orion outputs can be used nearly unmodified on Enzo outputs. (I remember when Jeff worked so hard to make this so -- http://www.flickr.com/photos/matthewturk/2598141965/ ) But even more than that, I think it's amazing that people are willing to share these scripts.
So yes, let's bake that right into the mission statement.
As for your comments about the microphysical solvers, believe me when I say they are not falling on deaf ears. Moving toward an open source, community-driven model for microphysical solvers is an issue near and dear to my own heart, having spent several years of my life writing a primordial chemistry solver. I believe there is a place for interfacing with that sort of project and endeavor inside yt -- in particular, interfacing with specific APIs and so forth to seamlessly calculate cooling times or EOS or opacities. Let's revisit this issue in the future.
(Although, if we step back for a second and look at what's in yt ... boundary condition calculations, cooling time calculations, gravity, ... the mind does wander.)
The revised bullet points I have:
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization that allows one to ask astrophysical questions of simulation data independent of the code used to produce that data. * To create a friendly, helpful community of scientists * To further the goals of Open Science * To construct an environment that encompasses the generation of data, starting from initial conditions, through simulations, and finally resulting in publication-quality plots * To create reproducible, cross-code questions and answers from astrophysical data * To present simulation data in physical terms, rather than strictly in simulation and data format terms * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code * To provide a place to create and share analysis codes, recipes, and other things that can be helpful to others seeking to answer similar scientific questions.
The next step in this is to try to distill it down into a sentence or two. I've included my first pass at this. Not all items have to be included -- they can be shuffled off and left implicit in the proper mission statement, but can show up in the broader directions. The ultimate goal of this is to provide both the short-form "elevator pitch" and then augment that with what we could generously call strategy documents.
Draft 1:
The yt project aims to produce an integrated science environment for asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
I'm not terribly satisfied with this draft. I don't quite know how to work in two things that I think should be stated -- that the end goal is, ideally, a community project (whose bus factor is equal to the number of users :) and that we want to focus on the physical underpinnings of simulations when asking questions rather than, say, the specifics of unformatted fortran or HDF5. I think that the broader focus (as an integrated science environment) comes across, but the other core aspects are a bit underserved.
Edits and suggestions?
Thanks again, everyone. I'm glad we're having this conversation.
-Matt
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow tools that I have been exposed to suffer from excessive generality, and thus are a bit too cumbersome to be easy to use, and thus too cumbersome to be actually used.
--Brian
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk
wrote: Hi everyone,
I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly appreciated, particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have.
A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ .
“Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.”
To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago:
http://yt.enzotools.org/principles.html
As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed.
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code
= What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis& optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers
= What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion
What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community.
Thanks very much for any thoughts,
Matt _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
This is looking awesome. Let me add a tiny bit of refinement: Draft 3 -------- The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions. To do so, it will encompass the creation of initial conditions, the execution of simulations, and the detailed exploration and visualization of the resultant data. It will also provide a standard framework based on physical quantities for interoperability between codes. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. It will provide a standard framework based on physical quantities for interoperability between codes. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
On 6/14/11 1:02 AM, Matthew Turk wrote:
Hi all,
tl;dr summary: New bullet points below, along with a first draft at a proper prose solidification. More comments still requested.
Thanks everyone for your thoughtful responses. Having this discussion as a group is really the only way to have the outcome of it be meaningful; I'm glad we could have as much of a discussion as we already have, and I hope that moving forward we can keep talking about this and refining it to sort of steer ourselves.
In reading over your replies, it's become clear that the mission goal bullet points in that original list were a bit ... well, shall we say, under-ambitious? So let's take the gloves off a bit. Chris, your comments in particular made me realize that my own feelings about this project we've got going are a bit more ambitious; Brian, Britton and Jeff, yours did as well. And this didn't come through in the bullet points, although it was alluded to. I'll put my comments at the bottom, along with an updated list of bullet points, after I respond to a couple things that were brought up.
On Mon, Jun 13, 2011 at 5:06 PM, j s oishi
wrote: Hi all,
This is already a great discussion, and it seems to me that there are a lot of great ideas that I only want to echo. First, I think what Matt started with is a great foundation: yt should continue to be a collaborative, open-source tool for reproducible physical analysis of simulation data. I think the idea that not only is yt itself open source, but the *entire software stack upon which it rests* is also open source is a point worth emphasizing. If nothing else, a user gets not only yt but also an amazing toolkit for doing numerical computation free of cost and restrictive licenses.
That is a very, very good point -- and the impetus for its initial creation, actually.
Second, I think that Brian's point about data provenance and reproducibility of an entire project is really a direction I would love to see yt move in. yt should allow (and encourage!) reproducibility beyond analysis to include simulation initialization, runtime, and final, reduced data products. Furthermore, I believe it should be able to do this in a cross-code manner: imagine having a set of descriptions (perhaps in the form of yt scripts, perhaps in some other machine/human readable format) that describe initial conditions, runtime parameters, analysis outputs and data products that could be run on Enzo and Ramses. We could move beyond code comparison test problems to real inter-code reproducibility.
Yes. Yes, and more yes; I firmly believe in this. Looking, realistically, at where we are and where we are going, I believe this is an utterly feasible goal, and the timescale is not terribly great -- effort simply needs to be applied in that direction.
We can have a longer discussion about this, but I think having this item in the mission statement for now is sufficient.
Finally, I think that Britton is right that we should also continue to emphasize that yt is a tool for physical reasoning on simulation data, and that *it*, not *you the user* make all necessary manipulations to get simulation data to physical quantities.
Thanks again for starting such an interesting discussion. I look forward to moving forward with yt.
j
On Mon, Jun 13, 2011 at 4:23 PM, Brian O'Shea
wrote: Hi Matt,
This may not be something specifically for the mission statement (depending on how wordy we want to get), but I'm very interested in using yt (or something that encompasses yt) as a workflow tool so that my simulations are completely reproducible. What I could imagine is something like this:
I very much like the workflow you laid out, although I would contend we should address more directly the task of running the simulation. On some level, it becomes a realizable goal to execute the main loop of the code in Python without any real overhead. This will have the side effect of providing much easier access to the data during the course of the simulation.
I would also scratch out "Enzo" and replace it with "Simulation Code" -- while pragmatically I recognize your simulations will likely be conducted in Enzo for the purposes of this provenance tracking, I feel it should be said that for the mission statement I believe in a code-neutral direction.
One difficulty here is the idea of actually moving the data. It is not clear the me that moving data around in file systems is a tractable, solvable problem. That is a good thing to strive for, but I personally can't wrap my head around it. Stephen? Britton?
1. Generate initial conditions, cosmological or otherwise. IC parameter file goes into a database, along with details about the code that's used to generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in a database. (what supercomputer? How many CPUs? Environmental variables? Which version of MPI? What date(s) did the job run on? What nodes? Copy of Enzo restart parameter files and perhaps hierarchy files, for later query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data, depending on how big the sim is). What directory is it in? Should it be world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I can go back and retrace all details.
At that point, I would know _precisely_ how and with what commands/code/parameter files/etc. the plots that are in my papers, and everything leading up to that, is generated. This helps when you go back to deal with the referee ("how DID I make that stupid plot? What was sigma8 again?"), but also for reproducibility, since in principle somebody could just go back, look at the database, and be able to do precisely what I did. Also, if somebody wanted to use archival data - something we hope to do more of in the future, as simulations grow in expense and complexity - there'd be no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for easily and transparently tracking all aspects of simulation generation, execution, and analysis for the purposes of reproducibility."
That's a great sentence.
Britton: I like your additions very much. It had completely slipped my mind that one of the most useful features of yt is its physical and geometric object selection.
Chris: I don't think you are stepping outside the scope of what we could generously call "The yt project" with what you mention. There are the technical goals, and the broader community goals. The goals of open science, reproducibility, and cultivating a community of scientists willing to share scripts, analysis routines, and even analysis modules are certainly part of what I think we are all striving for. And this goal isn't composed of just deploying infrastructure, rolling it out, but also providing a welcoming and friendly community of people willing to help. For instance, it's great that scripts written to generate phase plots from Orion outputs can be used nearly unmodified on Enzo outputs. (I remember when Jeff worked so hard to make this so -- http://www.flickr.com/photos/matthewturk/2598141965/ ) But even more than that, I think it's amazing that people are willing to share these scripts.
So yes, let's bake that right into the mission statement.
As for your comments about the microphysical solvers, believe me when I say they are not falling on deaf ears. Moving toward an open source, community-driven model for microphysical solvers is an issue near and dear to my own heart, having spent several years of my life writing a primordial chemistry solver. I believe there is a place for interfacing with that sort of project and endeavor inside yt -- in particular, interfacing with specific APIs and so forth to seamlessly calculate cooling times or EOS or opacities. Let's revisit this issue in the future.
(Although, if we step back for a second and look at what's in yt ... boundary condition calculations, cooling time calculations, gravity, ... the mind does wander.)
The revised bullet points I have:
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization that allows one to ask astrophysical questions of simulation data independent of the code used to produce that data. * To create a friendly, helpful community of scientists * To further the goals of Open Science * To construct an environment that encompasses the generation of data, starting from initial conditions, through simulations, and finally resulting in publication-quality plots * To create reproducible, cross-code questions and answers from astrophysical data * To present simulation data in physical terms, rather than strictly in simulation and data format terms * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code * To provide a place to create and share analysis codes, recipes, and other things that can be helpful to others seeking to answer similar scientific questions.
The next step in this is to try to distill it down into a sentence or two. I've included my first pass at this. Not all items have to be included -- they can be shuffled off and left implicit in the proper mission statement, but can show up in the broader directions. The ultimate goal of this is to provide both the short-form "elevator pitch" and then augment that with what we could generously call strategy documents.
Draft 1:
The yt project aims to produce an integrated science environment for asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
I'm not terribly satisfied with this draft. I don't quite know how to work in two things that I think should be stated -- that the end goal is, ideally, a community project (whose bus factor is equal to the number of users :) and that we want to focus on the physical underpinnings of simulations when asking questions rather than, say, the specifics of unformatted fortran or HDF5. I think that the broader focus (as an integrated science environment) comes across, but the other core aspects are a bit underserved.
Edits and suggestions?
Thanks again, everyone. I'm glad we're having this conversation.
-Matt
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow tools that I have been exposed to suffer from excessive generality, and thus are a bit too cumbersome to be easy to use, and thus too cumbersome to be actually used.
--Brian
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk
wrote: Hi everyone,
I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly appreciated, particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have.
A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ .
“Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.”
To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago:
http://yt.enzotools.org/principles.html
As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed.
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code
= What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis& optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers
= What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion
What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community.
Thanks very much for any thoughts,
Matt _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
This looks great to me. It's a good mix of what we can do now, what we want
to do in the future, and the manner in which we'll get there. +1
On Fri, Jun 17, 2011 at 3:20 PM, j s oishi
This is looking awesome. Let me add a tiny bit of refinement:
Draft 3 -------- The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions. To do so, it will encompass the creation of initial conditions, the execution of simulations, and the detailed exploration and visualization of the resultant data. It will also provide a standard framework based on physical quantities for interoperability between codes.
Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. It will provide a standard framework based on physical quantities for interoperability between codes. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
On 6/14/11 1:02 AM, Matthew Turk wrote:
Hi all,
tl;dr summary: New bullet points below, along with a first draft at a proper prose solidification. More comments still requested.
Thanks everyone for your thoughtful responses. Having this discussion as a group is really the only way to have the outcome of it be meaningful; I'm glad we could have as much of a discussion as we already have, and I hope that moving forward we can keep talking about this and refining it to sort of steer ourselves.
In reading over your replies, it's become clear that the mission goal bullet points in that original list were a bit ... well, shall we say, under-ambitious? So let's take the gloves off a bit. Chris, your comments in particular made me realize that my own feelings about this project we've got going are a bit more ambitious; Brian, Britton and Jeff, yours did as well. And this didn't come through in the bullet points, although it was alluded to. I'll put my comments at the bottom, along with an updated list of bullet points, after I respond to a couple things that were brought up.
On Mon, Jun 13, 2011 at 5:06 PM, j s oishi
wrote: Hi all,
This is already a great discussion, and it seems to me that there are a lot of great ideas that I only want to echo. First, I think what Matt started with is a great foundation: yt should continue to be a collaborative, open-source tool for reproducible physical analysis of simulation data. I think the idea that not only is yt itself open source, but the *entire software stack upon which it rests* is also open source is a point worth emphasizing. If nothing else, a user gets not only yt but also an amazing toolkit for doing numerical computation free of cost and restrictive licenses.
That is a very, very good point -- and the impetus for its initial creation, actually.
Second, I think that Brian's point about data provenance and reproducibility of an entire project is really a direction I would love to see yt move in. yt should allow (and encourage!) reproducibility beyond analysis to include simulation initialization, runtime, and final, reduced data products. Furthermore, I believe it should be able to do this in a cross-code manner: imagine having a set of descriptions (perhaps in the form of yt scripts, perhaps in some other machine/human readable format) that describe initial conditions, runtime parameters, analysis outputs and data products that could be run on Enzo and Ramses. We could move beyond code comparison test problems to real inter-code reproducibility.
Yes. Yes, and more yes; I firmly believe in this. Looking, realistically, at where we are and where we are going, I believe this is an utterly feasible goal, and the timescale is not terribly great -- effort simply needs to be applied in that direction.
We can have a longer discussion about this, but I think having this item in the mission statement for now is sufficient.
Finally, I think that Britton is right that we should also continue to emphasize that yt is a tool for physical reasoning on simulation data, and that *it*, not *you the user* make all necessary manipulations to get simulation data to physical quantities.
Thanks again for starting such an interesting discussion. I look forward to moving forward with yt.
j
On Mon, Jun 13, 2011 at 4:23 PM, Brian O'Shea
wrote:
Hi Matt,
This may not be something specifically for the mission statement (depending on how wordy we want to get), but I'm very interested in using yt (or something that encompasses yt) as a workflow tool so that my
simulations
are completely reproducible. What I could imagine is something like this:
I very much like the workflow you laid out, although I would contend we should address more directly the task of running the simulation. On some level, it becomes a realizable goal to execute the main loop of the code in Python without any real overhead. This will have the side effect of providing much easier access to the data during the course of the simulation.
I would also scratch out "Enzo" and replace it with "Simulation Code" -- while pragmatically I recognize your simulations will likely be conducted in Enzo for the purposes of this provenance tracking, I feel it should be said that for the mission statement I believe in a code-neutral direction.
One difficulty here is the idea of actually moving the data. It is not clear the me that moving data around in file systems is a tractable, solvable problem. That is a good thing to strive for, but I personally can't wrap my head around it. Stephen? Britton?
1. Generate initial conditions, cosmological or otherwise. IC parameter file goes into a database, along with details about the code that's used to generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in a database. (what supercomputer? How many CPUs? Environmental variables? Which version of MPI? What date(s) did the job run on? What nodes? Copy of Enzo restart parameter files and perhaps hierarchy files, for later query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data, depending on how big the sim is). What directory is it in? Should it be world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I can go back and retrace all details.
At that point, I would know _precisely_ how and with what commands/code/parameter files/etc. the plots that are in my papers, and everything leading up to that, is generated. This helps when you go back to deal with the referee ("how DID I make that stupid plot? What was sigma8 again?"), but also for reproducibility, since in principle somebody could just go back, look at the database, and be able to do precisely what I did. Also, if somebody wanted to use archival data - something we hope to do more of in the future, as simulations grow in expense and complexity - there'd be no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for easily and transparently tracking all aspects of simulation generation, execution, and analysis for the purposes of reproducibility."
That's a great sentence.
Britton: I like your additions very much. It had completely slipped my mind that one of the most useful features of yt is its physical and geometric object selection.
Chris: I don't think you are stepping outside the scope of what we could generously call "The yt project" with what you mention. There are the technical goals, and the broader community goals. The goals of open science, reproducibility, and cultivating a community of scientists willing to share scripts, analysis routines, and even analysis modules are certainly part of what I think we are all striving for. And this goal isn't composed of just deploying infrastructure, rolling it out, but also providing a welcoming and friendly community of people willing to help. For instance, it's great that scripts written to generate phase plots from Orion outputs can be used nearly unmodified on Enzo outputs. (I remember when Jeff worked so hard to make this so -- http://www.flickr.com/photos/matthewturk/2598141965/ ) But even more than that, I think it's amazing that people are willing to share these scripts.
So yes, let's bake that right into the mission statement.
As for your comments about the microphysical solvers, believe me when I say they are not falling on deaf ears. Moving toward an open source, community-driven model for microphysical solvers is an issue near and dear to my own heart, having spent several years of my life writing a primordial chemistry solver. I believe there is a place for interfacing with that sort of project and endeavor inside yt -- in particular, interfacing with specific APIs and so forth to seamlessly calculate cooling times or EOS or opacities. Let's revisit this issue in the future.
(Although, if we step back for a second and look at what's in yt ... boundary condition calculations, cooling time calculations, gravity, ... the mind does wander.)
The revised bullet points I have:
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization that allows one to ask astrophysical questions of simulation data independent of the code used to produce that data. * To create a friendly, helpful community of scientists * To further the goals of Open Science * To construct an environment that encompasses the generation of data, starting from initial conditions, through simulations, and finally resulting in publication-quality plots * To create reproducible, cross-code questions and answers from astrophysical data * To present simulation data in physical terms, rather than strictly in simulation and data format terms * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code * To provide a place to create and share analysis codes, recipes, and other things that can be helpful to others seeking to answer similar scientific questions.
The next step in this is to try to distill it down into a sentence or two. I've included my first pass at this. Not all items have to be included -- they can be shuffled off and left implicit in the proper mission statement, but can show up in the broader directions. The ultimate goal of this is to provide both the short-form "elevator pitch" and then augment that with what we could generously call strategy documents.
Draft 1:
The yt project aims to produce an integrated science environment for asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
I'm not terribly satisfied with this draft. I don't quite know how to work in two things that I think should be stated -- that the end goal is, ideally, a community project (whose bus factor is equal to the number of users :) and that we want to focus on the physical underpinnings of simulations when asking questions rather than, say, the specifics of unformatted fortran or HDF5. I think that the broader focus (as an integrated science environment) comes across, but the other core aspects are a bit underserved.
Edits and suggestions?
Thanks again, everyone. I'm glad we're having this conversation.
-Matt
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow tools that I have been exposed to suffer from excessive generality, and thus are a bit too cumbersome to be easy to use, and thus too cumbersome to be actually used.
--Brian
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk
wrote: Hi everyone,
I hope you'll take the opportunity to read and respond to this email, even if you're not a heavy-developer, or even a heavy-user, of yt. Your feedback and contributions would be greatly, greatly
appreciated,
particularly as this will help guide where yt development, community-building and (optimistically) use will go. I know that sometimes the signal-to-noise on the yt lists can be a bit low, but I think this is a particularly useful discussion to have.
A few of us have been brainstorming, in person, in IRC, etc about the direction yt has been going. There are a number of reasons for doing this -- to provide focus, to provide an idea of the off-in-the-distance goal, and to have a public statement of what we're about, which shows ambition, concern for the values that go into a scientific code, and an interest in providing access to that code. This boils down to coming up with a mission statement, which will help both focus our goals on what we want to provide, as well as describe those areas we do not want to provide. Much of this is based on the contents of “The Art of Community” by Jono Bacon, specifically around page 71 in the PDF available on www.artofcommunityonline.org/get/ .
“Mission statements are intended to be consistent and should rarely change, even if the tasks that achieve that mission change regularly. When building your mission statement, always have its longevity in mind. Remember, your mission statement is your slam-dunking, audacious goal. For many communities these missions can take decades or even longer to achieve. Their purpose is to not only describe the finish line, but to help the community stay on track.”
To develop a mission statement, which will act as a precursor to a strategic plan, we need to construct answers to three questions. These will provide the initial basis for a broader mission statement. For reference, here are some “principles” we came up with several years ago:
http://yt.enzotools.org/principles.html
As I mentioned above, a few of us have been spitballing answers to these questions, and it has reached the point where we really need to bring this forward, to conduct these discussions in public, to bring some clarity and engagement to the process. Ultimately, once we have sketched out a couple broad goals and bullet points, this can then be distilled into a short, pithy block of text that serves as a "Mission Statement." Below are some potential bullet points, but I feel strongly that it's important that these get refined and discussed.
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization * To create reproducible, cross-code questions and answers from astrophysical data * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code
= What are the opportunities and areas of collaboration? = * Development of new tools, new techniques, and adding support for new codes. * Adding components to the GUI * Providing outreach-capable frontends * Improving visualization qualities * Adding new methods of accessing data * Performance analysis& optimization * Deployment to new platforms * Designing new web pages * Writing documentation and recipes * Spreading the word * Support for Cartesian non-astrophysical simulations (weather, earthquakes) * Extension to non-Cartesian coordinate systems * Mentoring new developers
= What are the skills required? = * Thoughtful process * Careful quality control * Ability to communicate * An investment in “the answer” * Eagerness to participate in an open fashion
What other bullets, ideas, inclinations do people have? If we can start a discussion, maybe we can draft some text. This would certainly help with focusing our strategies for presenting yt to others, directing our development in conjunction with our scientific goals, and collaborating as a community.
Thanks very much for any thoughts,
Matt _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
I am also +1 on this draft.
On Sat, Jun 18, 2011 at 7:03 AM, Sam Skillman
This looks great to me. It's a good mix of what we can do now, what we want to do in the future, and the manner in which we'll get there. +1
On Fri, Jun 17, 2011 at 3:20 PM, j s oishi
wrote: This is looking awesome. Let me add a tiny bit of refinement:
Draft 3 -------- The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions. To do so, it will encompass the creation of initial conditions, the execution of simulations, and the detailed exploration and visualization of the resultant data. It will also provide a standard framework based on physical quantities for interoperability between codes.
Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. It will provide a standard framework based on physical quantities for interoperability between codes. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
On 6/14/11 1:02 AM, Matthew Turk wrote:
Hi all,
tl;dr summary: New bullet points below, along with a first draft at a proper prose solidification. More comments still requested.
Thanks everyone for your thoughtful responses. Having this discussion as a group is really the only way to have the outcome of it be meaningful; I'm glad we could have as much of a discussion as we already have, and I hope that moving forward we can keep talking about this and refining it to sort of steer ourselves.
In reading over your replies, it's become clear that the mission goal bullet points in that original list were a bit ... well, shall we say, under-ambitious? So let's take the gloves off a bit. Chris, your comments in particular made me realize that my own feelings about this project we've got going are a bit more ambitious; Brian, Britton and Jeff, yours did as well. And this didn't come through in the bullet points, although it was alluded to. I'll put my comments at the bottom, along with an updated list of bullet points, after I respond to a couple things that were brought up.
On Mon, Jun 13, 2011 at 5:06 PM, j s oishi
wrote: Hi all,
This is already a great discussion, and it seems to me that there are a lot of great ideas that I only want to echo. First, I think what Matt started with is a great foundation: yt should continue to be a collaborative, open-source tool for reproducible physical analysis of simulation data. I think the idea that not only is yt itself open source, but the *entire software stack upon which it rests* is also open source is a point worth emphasizing. If nothing else, a user gets not only yt but also an amazing toolkit for doing numerical computation free of cost and restrictive licenses.
That is a very, very good point -- and the impetus for its initial creation, actually.
Second, I think that Brian's point about data provenance and reproducibility of an entire project is really a direction I would love to see yt move in. yt should allow (and encourage!) reproducibility beyond analysis to include simulation initialization, runtime, and final, reduced data products. Furthermore, I believe it should be able to do this in a cross-code manner: imagine having a set of descriptions (perhaps in the form of yt scripts, perhaps in some other machine/human readable format) that describe initial conditions, runtime parameters, analysis outputs and data products that could be run on Enzo and Ramses. We could move beyond code comparison test problems to real inter-code reproducibility.
Yes. Yes, and more yes; I firmly believe in this. Looking, realistically, at where we are and where we are going, I believe this is an utterly feasible goal, and the timescale is not terribly great -- effort simply needs to be applied in that direction.
We can have a longer discussion about this, but I think having this item in the mission statement for now is sufficient.
Finally, I think that Britton is right that we should also continue to emphasize that yt is a tool for physical reasoning on simulation data, and that *it*, not *you the user* make all necessary manipulations to get simulation data to physical quantities.
Thanks again for starting such an interesting discussion. I look forward to moving forward with yt.
j
On Mon, Jun 13, 2011 at 4:23 PM, Brian O'Shea
wrote: Hi Matt,
This may not be something specifically for the mission statement (depending on how wordy we want to get), but I'm very interested in using yt (or something that encompasses yt) as a workflow tool so that my simulations are completely reproducible. What I could imagine is something like this:
I very much like the workflow you laid out, although I would contend we should address more directly the task of running the simulation. On some level, it becomes a realizable goal to execute the main loop of the code in Python without any real overhead. This will have the side effect of providing much easier access to the data during the course of the simulation.
I would also scratch out "Enzo" and replace it with "Simulation Code" -- while pragmatically I recognize your simulations will likely be conducted in Enzo for the purposes of this provenance tracking, I feel it should be said that for the mission statement I believe in a code-neutral direction.
One difficulty here is the idea of actually moving the data. It is not clear the me that moving data around in file systems is a tractable, solvable problem. That is a good thing to strive for, but I personally can't wrap my head around it. Stephen? Britton?
1. Generate initial conditions, cosmological or otherwise. IC parameter file goes into a database, along with details about the code that's used to generate my ICs (inits/MUSIC/grafic hash, outputs of make show-config and make show-flags, etc.)
2. Run simulation. Run-related and performance information is collected in a database. (what supercomputer? How many CPUs? Environmental variables? Which version of MPI? What date(s) did the job run on? What nodes? Copy of Enzo restart parameter files and perhaps hierarchy files, for later query?)
3. Back up Enzo data to mass storage (or perhaps some subset of the data, depending on how big the sim is). What directory is it in? Should it be world-readable, group-readable, etc.?
4. Do analysis. Record all details of analysis and plot making, so that I can go back and retrace all details.
At that point, I would know _precisely_ how and with what commands/code/parameter files/etc. the plots that are in my papers, and everything leading up to that, is generated. This helps when you go back to deal with the referee ("how DID I make that stupid plot? What was sigma8 again?"), but also for reproducibility, since in principle somebody could just go back, look at the database, and be able to do precisely what I did. Also, if somebody wanted to use archival data - something we hope to do more of in the future, as simulations grow in expense and complexity - there'd be no confusion about the provenance of that data.
If I had to sum this up in a sentence, it'd be "Transform yt into a tool for easily and transparently tracking all aspects of simulation generation, execution, and analysis for the purposes of reproducibility."
That's a great sentence.
Britton: I like your additions very much. It had completely slipped my mind that one of the most useful features of yt is its physical and geometric object selection.
Chris: I don't think you are stepping outside the scope of what we could generously call "The yt project" with what you mention. There are the technical goals, and the broader community goals. The goals of open science, reproducibility, and cultivating a community of scientists willing to share scripts, analysis routines, and even analysis modules are certainly part of what I think we are all striving for. And this goal isn't composed of just deploying infrastructure, rolling it out, but also providing a welcoming and friendly community of people willing to help. For instance, it's great that scripts written to generate phase plots from Orion outputs can be used nearly unmodified on Enzo outputs. (I remember when Jeff worked so hard to make this so -- http://www.flickr.com/photos/matthewturk/2598141965/ ) But even more than that, I think it's amazing that people are willing to share these scripts.
So yes, let's bake that right into the mission statement.
As for your comments about the microphysical solvers, believe me when I say they are not falling on deaf ears. Moving toward an open source, community-driven model for microphysical solvers is an issue near and dear to my own heart, having spent several years of my life writing a primordial chemistry solver. I believe there is a place for interfacing with that sort of project and endeavor inside yt -- in particular, interfacing with specific APIs and so forth to seamlessly calculate cooling times or EOS or opacities. Let's revisit this issue in the future.
(Although, if we step back for a second and look at what's in yt ... boundary condition calculations, cooling time calculations, gravity, ... the mind does wander.)
The revised bullet points I have:
= What is the mission? = * To create a fun, community-led, open source tool for asking and answering astrophysical questions through simulations, analysis and visualization that allows one to ask astrophysical questions of simulation data independent of the code used to produce that data. * To create a friendly, helpful community of scientists * To further the goals of Open Science * To construct an environment that encompasses the generation of data, starting from initial conditions, through simulations, and finally resulting in publication-quality plots * To create reproducible, cross-code questions and answers from astrophysical data * To present simulation data in physical terms, rather than strictly in simulation and data format terms * To construct a consistent language for asking questions of simulation data from many sources * To encourage researchers to participate in constructing a community code * To provide a place to create and share analysis codes, recipes, and other things that can be helpful to others seeking to answer similar scientific questions.
The next step in this is to try to distill it down into a sentence or two. I've included my first pass at this. Not all items have to be included -- they can be shuffled off and left implicit in the proper mission statement, but can show up in the broader directions. The ultimate goal of this is to provide both the short-form "elevator pitch" and then augment that with what we could generously call strategy documents.
Draft 1:
The yt project aims to produce an integrated science environment for asking and answering astrophysical questions, encompassing the creation of initial conditions, the execution of simulations and the detailed exploration and visualization of the resultant data. Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.
I'm not terribly satisfied with this draft. I don't quite know how to work in two things that I think should be stated -- that the end goal is, ideally, a community project (whose bus factor is equal to the number of users :) and that we want to focus on the physical underpinnings of simulations when asking questions rather than, say, the specifics of unformatted fortran or HDF5. I think that the broader focus (as an integrated science environment) comes across, but the other core aspects are a bit underserved.
Edits and suggestions?
Thanks again, everyone. I'm glad we're having this conversation.
-Matt
Anyway, maybe that's unrealistic, but it'd be awesome. The few workflow tools that I have been exposed to suffer from excessive generality, and thus are a bit too cumbersome to be easy to use, and thus too cumbersome to be actually used.
--Brian
On Mon, Jun 13, 2011 at 12:43 PM, Matthew Turk
wrote: > > Hi everyone, > > I hope you'll take the opportunity to read and respond to this > email, > even if you're not a heavy-developer, or even a heavy-user, of yt. > Your feedback and contributions would be greatly, greatly > appreciated, > particularly as this will help guide where yt development, > community-building and (optimistically) use will go. I know that > sometimes the signal-to-noise on the yt lists can be a bit low, but > I > think this is a particularly useful discussion to have. > > A few of us have been brainstorming, in person, in IRC, etc about > the > direction yt has been going. There are a number of reasons for > doing > this -- to provide focus, to provide an idea of the > off-in-the-distance goal, and to have a public statement of what > we're > about, which shows ambition, concern for the values that go into a > scientific code, and an interest in providing access to that code. > This boils down to coming up with a mission statement, which will > help > both focus our goals on what we want to provide, as well as describe > those areas we do not want to provide. Much of this is based on the > contents of “The Art of Community” by Jono Bacon, specifically > around > page 71 in the PDF available on www.artofcommunityonline.org/get/ . > > “Mission statements are intended to be consistent and should rarely > change, even if the tasks that achieve that mission change > regularly. > When building your mission statement, always have its longevity in > mind. Remember, your mission statement is your slam-dunking, > audacious > goal. For many communities these missions can take decades or even > longer to achieve. Their purpose is to not only describe the finish > line, but to help the community stay on track.” > > To develop a mission statement, which will act as a precursor to a > strategic plan, we need to construct answers to three questions. > These will provide the initial basis for a broader mission > statement. > For reference, here are some “principles” we came up with several > years ago: > > http://yt.enzotools.org/principles.html > > As I mentioned above, a few of us have been spitballing answers to > these questions, and it has reached the point where we really need > to > bring this forward, to conduct these discussions in public, to bring > some clarity and engagement to the process. Ultimately, once we > have > sketched out a couple broad goals and bullet points, this can then > be > distilled into a short, pithy block of text that serves as a > "Mission > Statement." Below are some potential bullet points, but I feel > strongly that it's important that these get refined and discussed. > > = What is the mission? = > * To create a fun, community-led, open source tool for asking and > answering astrophysical questions through simulations, analysis and > visualization > * To create reproducible, cross-code questions and answers from > astrophysical data > * To construct a consistent language for asking questions of > simulation data from many sources > * To encourage researchers to participate in constructing a > community > code > > = What are the opportunities and areas of collaboration? = > * Development of new tools, new techniques, and adding support for > new > codes. > * Adding components to the GUI > * Providing outreach-capable frontends > * Improving visualization qualities > * Adding new methods of accessing data > * Performance analysis& optimization > * Deployment to new platforms > * Designing new web pages > * Writing documentation and recipes > * Spreading the word > * Support for Cartesian non-astrophysical simulations (weather, > earthquakes) > * Extension to non-Cartesian coordinate systems > * Mentoring new developers > > = What are the skills required? = > * Thoughtful process > * Careful quality control > * Ability to communicate > * An investment in “the answer” > * Eagerness to participate in an open fashion > > What other bullets, ideas, inclinations do people have? If we can > start a discussion, maybe we can draft some text. This would > certainly help with focusing our strategies for presenting yt to > others, directing our development in conjunction with our scientific > goals, and collaborating as a community. > > Thanks very much for any thoughts, > > Matt > _______________________________________________ > Yt-dev mailing list > Yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (5)
-
Brian O'Shea
-
Cameron Hummels
-
j s oishi
-
Matthew Turk
-
Sam Skillman