On Mon, Apr 4, 2011 at 5:00 PM, Stephen Skory email@example.com wrote:
Hi Matt & others,
I've done some more thinking about this stuff and I have some questions/thoughts I'd like your thoughts on.
I would like to set down some solid ideas of what we would like to get out of this. This will allow us to design the system better from the get-go. Here's what I've got:
Yes, I agree. The goal should be able to identify outputs from a given "simulation run" ordered in time. For Enzo this will be more straightforward than perhaps for other codes, because it includes the UUIDs of current and previous. (My recollection is that this is why UUIDs were added in the first place.) I would say that inferring based on file paths is fine for other simulation types, as well as Enzo datasets that were created before the UUIDs were added.
This is where I become less certain of things. My initial feeling was that we'd want a time indicator (redshift, time or both), a full path to the dataset, and its position in a graph of simulation outputs. It makes sense to include additional parameters, but as you note it may end up being that the only mechanism we have at the moment for doing so is a set of heterogeneous table formats.
My initial hope for this database was twofold --
1) Provide a list of all "known" datasets in Reason (which we could get from parameter_files.csv) 2) Provide a database that could be shared of simulations with parameters. (i.e., manual and simple publishing of datasets on shared filesystems.)
Both of these kind of tie together. On a shared filesystem, like Kraken, one could imagine opening a dataset with:
Anyway, I don't think we need to re-implement a full parameter scraping, but it would not add so much overhead as to be undesirable.
This is exactly right, and I completely agree. But I don't want to completely reinvent the VO (it's done a good job of inventing itself) but instead apply a simple layer of querying and comparison; it can be pretty DIY, I think.
This is where this is different from parameter_files.csv. That system acts as a FIFO of the last, say, 200 datasets. When they're loaded, the unique hash/ID is looked up, and if it's found in the .csv it's updated to point to the new location. Rather than providing an update mechanism explicitly, it's done implicitly. I don't see why this couldn't be the same thing.
Thanks for the comments!
-- Stephen Skory firstname.lastname@example.org http://stephenskory.com/ 510.621.3687 (google voice)
Yt-dev mailing list Ytemail@example.com http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org