Hi all, this is not finished, but I wanted to show you all what I've been working on for the last couple days. I think it's interesting and promising. To remind you, the idea is that whenever you load() a dataset in yt, it will update a database on Amazon AWS with some information about that dataset. This will provide you with a central location where you can see all datasets you've ever touched with yt. I've written a cgi-bin python script that, given your AWS credentials, will display entires in your database and allow you to build queries to narrow things down. To give you an idea of what I've done, and to solicit ideas for improvements, and to play around with this whole screen casting thing, I've made a screen cast showing what I've got so far. This is linked below. So - let me know if you have any ideas or comments. Like I said, it's unfinished, but it works well enough to show off. Thanks! http://vimeo.com/28797703 -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
Stephen, This is great! What an excellent way of keeping track of your datasets. A couple of comments: 1) It might be nice to have a function which instantiates the datadumps into the Amazon DB without having to actually load them in yt (to save the time of actually loading them one-by-one). 2) It might also be nice to have a keyword / description that one could set in the PF or perhaps in the DB when loading the DDs, for identification purposes (e.g. This is the run I did to look at metal cooling with feedback, etc.). Perhaps that is something that should be built into enzo's PF though. Hmm... 3) Is this functionality specific to enzo datafiles, or is it universal to anything yt accesses? I see the "ProblemType" header, which suggests that it may be unique to enzo. Or is it just pulling from a variety of different headers which are defined in each datafile, regardless of simulation-code origin? It would be good for this to keep with yt's goals of universal acceptance of other sim codes. Also, the screencast is great for demonstrating this functionality (much better than had you written a detailed email message explicating it). Anyway, good work! I look forward to this sort of functionality being included in yt, so that I can better track my different datasets instead of just by filename (e.g. I just finished a job named L20S2D9H26SPMZF1 !!) Cameron On 09/08/2011 11:16 PM, Stephen Skory wrote:
Hi all,
this is not finished, but I wanted to show you all what I've been working on for the last couple days. I think it's interesting and promising. To remind you, the idea is that whenever you load() a dataset in yt, it will update a database on Amazon AWS with some information about that dataset. This will provide you with a central location where you can see all datasets you've ever touched with yt. I've written a cgi-bin python script that, given your AWS credentials, will display entires in your database and allow you to build queries to narrow things down. To give you an idea of what I've done, and to solicit ideas for improvements, and to play around with this whole screen casting thing, I've made a screen cast showing what I've got so far. This is linked below.
So - let me know if you have any ideas or comments. Like I said, it's unfinished, but it works well enough to show off. Thanks!
-- Cameron Hummels PhD Candidate, Astronomy Department of Columbia University Public Outreach Director, Astronomy Department of Columbia University NASA IYA New York State Student Ambassador http://outreach.astro.columbia.edu PGP: 0x06F886E3
Hi Cameron,
1) It might be nice to have a function which instantiates the datadumps into the Amazon DB without having to actually load them in yt (to save the time of actually loading them one-by-one).
This is something I've already thought of but haven't implemented. Something like 'yt touch DD????/DD????' or some similarly descriptive verb of what's going on. I'd like to have the SQLite local database stuff working before trying to set that up (with Matt's help).
2) It might also be nice to have a keyword / description that one could set in the PF or perhaps in the DB when loading the DDs, for identification purposes (e.g. This is the run I did to look at metal cooling with feedback, etc.). Perhaps that is something that should be built into enzo's PF though. Hmm...
I think there are few ways that this could be done. It would not be an awful thing if there was a "Description" parameter added to Enzo that was attached to every restart file. This could then be included with the data added to the database. Or, a description field could be added to SimpleDB that is editable through the web script. I think in any case, the second option is probably a good one, but the first one might be worth thinking about.
3) Is this functionality specific to enzo datafiles, or is it universal to anything yt accesses? I see the "ProblemType" header, which suggests that it may be unique to enzo. Or is it just pulling from a variety of different headers which are defined in each datafile, regardless of simulation-code origin? It would be good for this to keep with yt's goals of universal acceptance of other sim codes.
I would like to make things universal, and the nature of SimpleDB is flexible enough that it's not a problem on that end of things. Unlike a traditional SQL database, items stored don't all have to have the same columns. So yes, your suspicion is mostly correct - the column headers are created based on the data returned. But the search input boxes at the bottom are done by hand, since there is no obvious way to do them automatically (I think...). I think that when this is all (SQLite and SimpleDB) working for enzo datasets, which it is not currently, it won't be too difficult to get it to work for other simulation datasets. Thanks for the comments! -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
A brief follow-up, Matt pointed out that there is already a description field, "MetaDataIdentifier = whateverstringyouwant", but that doesn't allow you to have spaces in the description string. So, should that be included in the database(s)? -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
participants (2)
-
Cameron Hummels
-
Stephen Skory