Stephen and I are at the SciPy 2009 workshop. I saw a talk yesterday on Time Series analysis of nueroimaging data ( all the SciPy talks are online here: http://www.archive.org/search.php?query=scipy but this is the talk in question: http://www.archive.org/details/scipy09_day1_04-Ariel_Rokem and Peter Norvig's is especially good: http://www.archive.org/details/scipy09_day1_03-Peter_Norvig ) and got really jazzed up about time series analysis.
So, starting out initially on my own (and later to incorporate Britton's EnzoSimulation code) I've written up a first pass at a time series analysis framework. This, as with all the experimental stuff that's not ready for trunk, is in the hg repo in the branch timeseries.
Right now it works as follows -- you instantiate an EnzoTimeSeries, with an optional name and an output log. (Currently the entire thing is populated via the "OutputLog" which is in Enzo1.5 and beyond.) It reads in all the parameter files and instantiates them (see below for why this is not awesome) and then provides some mechanisms for getting data about and from them.
The architecture is separated into three main sections -- time series objects, data objects, and analysis tasks. The first two shouldn't ever really have to be written by the user; anything that exists as a data object in yt already (spheres, regions, etc etc) are automaticlaly converted into time-series data objects. The analysis tasks are a bit more user-approachable. You can see what's already been created here:
For example, the MaximumValue task:
class MaximumValue(AnalysisTask): _params = ['field']
def eval(self, data_object): v = data_object.quantities["MaxLocation"]( self.field, lazy_reader=True) return v
What this does is say, I will accept a 'field' parameter, and then inside 'eval' it returns the appropriate value. (Note that the time series framework introspects the 'eval' method, so if it accepts a 'pf' instead of 'data_object' it knows what to do.)
So, you might say, whatever, right? How would I use this? Well, I've placed an example file here:
from yt.mods import * from yt.data_objects import *
ts = EnzoTimeSeries("OutputLog") ret1 = ts.eval([MaximumValue("Density"), SlicePlotDataset("Density", 0, [0.5, 0.5, 0.5]) ]) sp = ts.sphere([0.5, 0.5, 0.5], 1.0) ret2 = sp.eval([MaximumValue("Density"), CurrentTimeYears()])
So let's examine this one at a time.
The first line:
ts = EnzoTimeSeries("OutputLog")
initializes the time series. Here I've screwed up the arguments, so the name is OutputLog, and the location of the OutputLog itself goes to the default value of ... "OutputLog".
The next line:
ret1 = ts.eval([MaximumValue("Density"), SlicePlotDataset("Density", 0, [0.5, 0.5, 0.5]) ])
Creates two tasks, MaximumValue and SlicePlotDensity, each with their own arguments. MaximumValue is told "Density" and SlicePlotDataset accepts "Density", the axis, and the center. These are given to the time series and it's told, go ahead and do this. It does, and it returns the list of list of values as ret1 -- the first being the max value, the second being the slice filenames.
Okay, now, if we keep going:
sp = ts.sphere([0.5, 0.5, 0.5], 1.0) ret2 = sp.eval([MaximumValue("Density"), CurrentTimeYears()])
The first line here constructs a 'time series data operator' -- a sphere. The sphere is centered at (0.5, 0.5, 0.5) and has a radius of 1.0. We can then call eval on it, just like we did on the time series object, and it creates and operates on that sphere at every step in the time series.
This is basically what I was hoping to move toward with yt-2.0 -- a disconnected operator/data model. The framework isn't sufficient yet to do this, but it's getting there. I'll be merging in Britton's work wiht EnzoSimulation for grabbing and selecting cosmology information. Additionally, I'll be splitting out the reliance on OutputLog into a generalized time series creator, which CAN but NEEDN'T accept OutputLog. The population step and the time series will be separate, and Britton's got a mechanism for populating from an input parameter file.
* All the hierarchies stick around because the parameter files are stored. This won't work for large datasets, as it'll quickly blow out the ram. I'm working on a work around that will also handle slicing, etc. * The plotting isn't done yet, but it will be. * Everything is in lists right now, because the return values need not be floating point, but can be strings, objects, 3D arrays, whatever. * Some basic tasks should be built in -- mostly things like "filename" and "time" and "redshift" which you should be able to just access and grab. * Storing results of evaluation tasks is also not clear to me yet. Recarrays would be great for fixed-size items, but not objects. Maybe the user should just have the option to eval-and-store. * Adding tasks requires some object instantiation, and usage of class names. Do y'all have any thoughts on this? Should we move to the same paradigm used in the callback method, where a plaintext addressor name is used? (i.e. modify["grids"] instead of add_callback(GridBoundaryCallback())...)
I really feel like moving toward time series analysis is the next major step, and I'd like to continue working on this, as I'll be using it more and more in the future. Additionally, I think we could move toward something of a pipeline, as well -- stringing together inputs and outputs. Something like:
my_pipeline = TaskPipeline(MaximumValueLocation("Density"), VirialRadius(), ...) ts.eval(my_pipeline)
and then have the outputs of tasks fed from one to the next.
Anyway, what do you all think?
PS The reorganization should happen around the time we focus on time series stuff -- which is why it's under data_objects.