Matt,
Anyway, just a handful of thoughts. Stephen, you have quite a bit of experience with SQLite; do you think this could be a fun application of them? Or would that be overkill? Maybe the whole system could be handled with just filenames?
I think that there is a fair bit of upside for something like this. Since it would be nice to have a (semi)-permanent storage of records in the SQLite database, we could leverage the synergy of the collaboration... wait I lapsed into corporate-speak. Getting back, we could use the unique UUIDs of simulations and create a single table per simulation (MetaDataSimulationUUID) with entries for each MetaDataDatasetUUID. The tables would have columns with useful things: redshift, etc... We could then make some functions that would very easily return everything yt knows about a dataset and all its peers, which could then be used to drive time-ordered analysis, or simply just inform a user about the Big Picture of this dataset & simulation.
It could be wise to include some date information in the database (date first read in or last accessed date), which could be useful for user information or house-keeping. Gmail tells you when you last accessed your account, why can't yt tell you how lazy you've been? We could do some neat things with a SQLite database.
The drawbacks are mostly already leapt over now that yt includes SQLite. I have told Matt that I would implement a SQLite abstraction layer for the merger tree, and this may be an opportunity to dive into that for us.