Mailman 3 August 2009 - yt-dev

HaloProfiler overhaul
by Britton Smith 05 Oct '09

05 Oct '09

Greetings, I have just finished an extensive overhaul of the HaloProfiler that includes a bunch of new features, as well as completely changes the running syntax. Other than the addition of various new features, the main goals of this overhaul were: 1) to make the tool more general (mostly by removing hard-coded filtering by virial quantities), essentially allowing someone to profile a random list of points that may or may not be actual halos Easy filtering of halos by virial quantities remains an option, but is now simply a specific instance of something more general and far more powerful: a halo filtering device that allow the user to create their own filter functions to comb through radial profile data and decide whether a halo meets certain criteria. 2) to remove the dependence on an additional parameter file for the various options. The parameter file is now gone. Most of the HaloProfiler parameters have been turned into instantiation keyword args. The keyword list is now considerably longer, but the benefit is that the number of files needed to run this thing has been reduced from 2 (running script and par file) to just 1 (running script). There are a large number of keywords options that are specific to either the profile or projection routines that are taken in at instantiation and stored as attributes. This was done to keep those function calls simple. I'm curious to know peoples' thoughts on whether these keyword args should stay put or move to the individual function calls. Adding fields for profiling and projections have been moved to functions addProfile and addProjection. Here is a brief list of the new features that I can remember: - the halo list read routine can be easily customized to read in columned ascii data of varying formats through the use of a dictionary (see attribute self.halo_list_format) - ability to change the function call, args, and kwargs for the halo finder Profiles: - filter halo list with user-written filter functions (see new file HaloFilters.py for an example of a function to filter based on virial quantities) - extract and output scalar quantities from halos using the filter functions - pre-filter halos based on values in the initial halo list (skipping the profiling altogether and saving a load of time) Projections: - choose the axes to be projected (instead of hard-coded to all three) - easily select the list of halos to be projected (the total list, filtered list, a new file, or an actual list) Before I commit all this, I'd like a little feedback, mostly on the migration of the par file parameters to keyword args. I put three files in the pastebin for people to download. I chose not to submit a diff since the changes were so sweeping. The new HaloProfiler.py: http://paste.enzotools.org/show/178/ HaloFilters.py (should go in yt/extensions with HaloProfiler.py) http://paste.enzotools.org/show/179/ runHaloProfiler.py (an example script to run the new HaloProfiler) http://paste.enzotools.org/show/180/ I will try my best to write up full documentation for this as soon as possible, perhaps even today. Please let me know what you think. Britton

3 5

Hierarchy optimization
by Matthew Turk 01 Sep '09

01 Sep '09

Hi everyone, The rewritten hierarchy now passes all the unit tests. You can inspect this branch in this repository: http://hg.enzotools.org/yt/ in the named branch hierarchy-opt . The Orion hierarchy will need to be converted (Jeff and I can talk about this offline) but overall I think the new mechanism for instantiating and creating hierarchies is now extremely straightforward, and will suit itself better to new AMR formats. You can see the outline here: http://hg.enzotools.org/yt/file/hierarchy-opt/yt/lagos/HierarchyType.py#l201 This is the base class, and other hierarchies just have to implement those functions to be real. The enzo hierarchy instantiation should be marginally faster and might use a bit less RAM. However, I still need to work on both reorganizing the DataQueue and the Grid objects, which will become smaller (eliminating some data members) and faster. I intend to eliminate at least one dictionary in the grid object. (Each dictionary removal drops 1K per object -- for the L7, this would be 300 megs per dict.) If you'd like to take a look, please do so, and let me know how it works out. You can do this with: hg clone http://hg.enzotools.org/yt/ ./experimental-yt cd experimental-yt hg up -C hierarchy-opt Best, -Matt

2 2

Auto-cythonize your code
by Matthew Turk 26 Aug '09

26 Aug '09

Hi guys, Just as a note, if you're doing heavily performance-intensive stuff, you can use the latest version of Cython to auto-cythonize (i.e., compile down to C) your code. This requires you have Cython > 0.11.2 or so installed. http://docs.cython.org/docs/tutorial.html#pyximport-cython-compilation-the-… At the top of your script using yt, put this line: import pyximport; pyximport.install(pyimport=True) And it'll try compiling to C all of your code before executing. Without annotations, speedups of 10-30% have been reported. Currently, even after hacking out the "exec" calls, it doesn't work with yt ... but I'm going to play with this some more later. -Matt

1 0

Fwd: [Enthought-Dev] PySide to replace PyQt?
by Matthew Turk 26 Aug '09

26 Aug '09

As some of you know, yt's GUI right now is based on hand-coded wxPython. The next generation, available in the yt hg repo, is based on TraitsUI. Right now it uses some wx-specific code for generating Matplotlib canvases, but it's largely portable. It looks like going forward TraitsUI will primarily use Qt. I've used the nextgen GUI with Qt (switching toolkits is trivial with TraitsUI) and it looks good, but this will still be something of a change, when it happens -- I anticipate we'll track TraitsUI on this front. It's not clear to me how easy installing Qt is from an automated standpoint. The second message contains Eric Jones's response about PyQt/PySide and the future of TraitsUI. Forwarded conversation Subject: [Enthought-Dev] PySide to replace PyQt? ------------------------ From: Glenn Tarbox, PhD <glenn(a)tarbox.org> Date: Tue, Aug 25, 2009 at 8:31 PM To: IPython-dev(a)scipy.org, Enthought-Dev(a)enthought.com Sooner or later, something was gonna need to happen WRT Riverbank and the PyQt licensing. I had hoped that an entirely new project wasn't going to be necessary.... but apparently it is. PySide Released to the Wild: http://labs.trolltech.com/blogs/2009/08/25/pyside-released-to-the-wild/ >From the PySide FAQ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What about PyQt? Nokia’s initial research into Python bindings for Qt involved speaking with Riverbank Computing, the makers of PyQt. We had several discussions with them to see if it was possible to use PyQt to achieve our goals. Unfortunately, a common agreement could not be found , so in the end we decided to proceed with PySide. We will however maintain API compatibility with PyQt (you can use the same method names but can’t inter-operate with PyQt), at least for the initial release. To import PySide you have to use “import PySide” instead of “import PyQt4″. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< I didn't know where to post this. PySide needs to mature a bit (support more than Linux for example) but both Matplotlib and Enthought are affected. PyQt will likely need to be replaced in both packages once PySide becomes more mature as the licensing of PyQt is problematic now that Qt is LGPL. Its also likely that with Nokia's backing, the PySide API will eventually dominate. Hopefully, the above statement regarding the similarity of the API will make moving over easy. Personally I'd like to see a focus on Qt vs Wx by Enthought as I believe it to be much more powerful... but thats my personal opinion and what I use. As a side note, I've successfully nailed my C++ Qt code to IPython using a Cython shim. The PyQt event loop is available and all seems to work great. -glenn -- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org _______________________________________________ Enthought-Dev mailing list Enthought-Dev(a)enthought.com https://mail.enthought.com/mailman/listinfo/enthought-dev ---------- From: Eric Jones <eric(a)enthought.com> Date: Wed, Aug 26, 2009 at 1:04 AM To: enthought-dev(a)enthought.com Here are my thoughts and comments: 1. We are switching our focus to Qt and away from wx for all new work. That happened when the Qt LGPL was announced. We aren't abandoning wx in the near future, though. To much of our legacy stuff relies on it. 2. Evan Patterson has done excellent work this summer, and switching between wx and qt backends on traits.ui/Envisage applications work pretty dang well. At least two of our large (consulting) applications work well under either one without a line of code changed to switch between the two. 3. We're going to use PyQt (purchasing the requisite licenses) for consulting project releases later this year. 4. We are pained to see that Nokia and Riverbank didn't come to an agreement. a. It has led to an awkward in-between period where Nokia's solution isn't ready, and Phil's is inevitably losing part of its mind share. b. We like Phil and appreciate the huge contributions he has made with PyQt and also all the work he put into traits.ui for the Qt backend. I really would like to see the code purchased by Nokia. With the limited information we all have, it does seem that all parties (Nokia, Phil, the community) would have come out ahead had they reached an agreement. Of course, limited information is of limited value... c. We feel there are several iterations before PySide performs as well (cpu and memory) and is as robust as PyQt. That leads to more work for all of us to use PyQt now and switch to/test on PySide later. d. It just hurts to think about talented engineers re-solving a well solved problem with little engineering benefit. The planet is losing many man years of engineering effort that could be applied to new problems. Talented man years are harder to come by than people think... 5. QScintilla is part of PyQt, but I am not sure (even doubt) it will be in PySide. It is a useful widget in the software IDE world, and it'll take a lot of work to reproduce the wrapper in a clean room. I'm not sure QTextEdit will suffice as a replacement. We'll have to see how this pans out. 6. Inevitably the licensing issue will lead to a larger community using PySide. 7. Much work is being done on IPython to make it easier to fit into a GUI. We also have a version that Gael put together last year for wx. We are waiting to move this to Qt until the huge IPython re-factor going on is finished. There are many subtleties in the corner's of IPython and GUIs that are hard to get right. The wx version we have is passable, but not robust enough to be an IPython-in-console replacement. Sigh. eric _______________________________________________ Enthought-Dev mailing list Enthought-Dev(a)enthought.com https://mail.enthought.com/mailman/listinfo/enthought-dev _______________________________________________ Enthought-Dev mailing list Enthought-Dev(a)enthought.com https://mail.enthought.com/mailman/listinfo/enthought-dev

1 0

new halo finder
by Stephen Skory 25 Aug '09

25 Aug '09

Hi all, in the spirit started by Britton of announcing active development, some of may not know that I have been working on a highly parallel halo finder based on the HOP method. The current method of parallelizing HOP that is in 1.5 and svn-trunk essentially runs HOP serially on each subregion and glues the results back together at the very end. It works wonderfully when it works. Its biggest constraint is that every halo must exist fully inside at least one of the subregions. For high-particle count, small-cosmology simulations, one can end up with too many particles in one region, and this method breaks down. The new halo finder, called 'chainHOP' (for now) for lack of a better term, takes a different approach to parallelize halo finding. Without getting into the heavy details, chainHOP parallelizes and glues haloes together based on the membership of particles in the chains that make up the haloes. A single halo may exist in several subregions. Unfortunately, the results are not identical to good old HOP, and never will be. A primary reason is because I'm using a different kd tree from HOP, and the kd tree HOP uses gives wrong answers (it calculates the distances between particles incorrectly, starting at something like the 6th decimal place, but it's been a while since I compared the two). However, the haloes found are very similar in what counts - size, center of mass and number of haloes. The upshot of all of this is, I can, for example, find the haloes in the z=0 L7 (512**3 particles) dataset in about 16 minutes on 128 cores on Triton. I have done the same for a 1024**3 unigrid dataset on Triton (large nodes) on 512 cores in about 6 hours, and since then I have made a couple improvements, and it should be more like 5 hours now. Much of that increase in time comes from a 100x increase in the number of 'chains' that need to be merged. My work is not ready for prime-time yet, but I thought you might be interested to know! _______________________________________________________ sskory(a)physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________

2 1

Time series data
by Matthew Turk 21 Aug '09

21 Aug '09

Hi everyone, Stephen and I are at the SciPy 2009 workshop. I saw a talk yesterday on Time Series analysis of nueroimaging data ( all the SciPy talks are online here: http://www.archive.org/search.php?query=scipy but this is the talk in question: http://www.archive.org/details/scipy09_day1_04-Ariel_Rokem and Peter Norvig's is especially good: http://www.archive.org/details/scipy09_day1_03-Peter_Norvig ) and got really jazzed up about time series analysis. So, starting out initially on my own (and later to incorporate Britton's EnzoSimulation code) I've written up a first pass at a time series analysis framework. This, as with all the experimental stuff that's not ready for trunk, is in the hg repo in the branch timeseries. Right now it works as follows -- you instantiate an EnzoTimeSeries, with an optional name and an output log. (Currently the entire thing is populated via the "OutputLog" which is in Enzo1.5 and beyond.) It reads in all the parameter files and instantiates them (see below for why this is not awesome) and then provides some mechanisms for getting data about and from them. The architecture is separated into three main sections -- time series objects, data objects, and analysis tasks. The first two shouldn't ever really have to be written by the user; anything that exists as a data object in yt already (spheres, regions, etc etc) are automaticlaly converted into time-series data objects. The analysis tasks are a bit more user-approachable. You can see what's already been created here: http://hg.enzotools.org/yt/file/timeseries/yt/data_objects/analyzer_objects… For example, the MaximumValue task: class MaximumValue(AnalysisTask): _params = ['field'] def eval(self, data_object): v = data_object.quantities["MaxLocation"]( self.field, lazy_reader=True)[0] return v What this does is say, I will accept a 'field' parameter, and then inside 'eval' it returns the appropriate value. (Note that the time series framework introspects the 'eval' method, so if it accepts a 'pf' instead of 'data_object' it knows what to do.) So, you might say, whatever, right? How would I use this? Well, I've placed an example file here: http://hg.enzotools.org/test_scripts/file/tip/time_series.py from yt.mods import * from yt.data_objects import * ts = EnzoTimeSeries("OutputLog") ret1 = ts.eval([MaximumValue("Density"), SlicePlotDataset("Density", 0, [0.5, 0.5, 0.5]) ]) sp = ts.sphere([0.5, 0.5, 0.5], 1.0) ret2 = sp.eval([MaximumValue("Density"), CurrentTimeYears()]) So let's examine this one at a time. The first line: ts = EnzoTimeSeries("OutputLog") initializes the time series. Here I've screwed up the arguments, so the name is OutputLog, and the location of the OutputLog itself goes to the default value of ... "OutputLog". The next line: ret1 = ts.eval([MaximumValue("Density"), SlicePlotDataset("Density", 0, [0.5, 0.5, 0.5]) ]) Creates two tasks, MaximumValue and SlicePlotDensity, each with their own arguments. MaximumValue is told "Density" and SlicePlotDataset accepts "Density", the axis, and the center. These are given to the time series and it's told, go ahead and do this. It does, and it returns the list of list of values as ret1 -- the first being the max value, the second being the slice filenames. Okay, now, if we keep going: sp = ts.sphere([0.5, 0.5, 0.5], 1.0) ret2 = sp.eval([MaximumValue("Density"), CurrentTimeYears()]) The first line here constructs a 'time series data operator' -- a sphere. The sphere is centered at (0.5, 0.5, 0.5) and has a radius of 1.0. We can then call eval on it, just like we did on the time series object, and it creates and operates on that sphere at every step in the time series. This is basically what I was hoping to move toward with yt-2.0 -- a disconnected operator/data model. The framework isn't sufficient yet to do this, but it's getting there. I'll be merging in Britton's work wiht EnzoSimulation for grabbing and selecting cosmology information. Additionally, I'll be splitting out the reliance on OutputLog into a generalized time series creator, which CAN but NEEDN'T accept OutputLog. The population step and the time series will be separate, and Britton's got a mechanism for populating from an input parameter file. Shortcomings: * All the hierarchies stick around because the parameter files are stored. This won't work for large datasets, as it'll quickly blow out the ram. I'm working on a work around that will also handle slicing, etc. * The plotting isn't done yet, but it will be. * Everything is in lists right now, because the return values need not be floating point, but can be strings, objects, 3D arrays, whatever. * Some basic tasks should be built in -- mostly things like "filename" and "time" and "redshift" which you should be able to just access and grab. * Storing results of evaluation tasks is also not clear to me yet. Recarrays would be great for fixed-size items, but not objects. Maybe the user should just have the option to eval-and-store. * Adding tasks requires some object instantiation, and usage of class names. Do y'all have any thoughts on this? Should we move to the same paradigm used in the callback method, where a plaintext addressor name is used? (i.e. modify["grids"] instead of add_callback(GridBoundaryCallback())...) I really feel like moving toward time series analysis is the next major step, and I'd like to continue working on this, as I'll be using it more and more in the future. Additionally, I think we could move toward something of a pipeline, as well -- stringing together inputs and outputs. Something like: my_pipeline = TaskPipeline(MaximumValueLocation("Density"), VirialRadius(), ...) ts.eval(my_pipeline) and then have the outputs of tasks fed from one to the next. Anyway, what do you all think? -Matt PS The reorganization should happen around the time we focus on time series stuff -- which is why it's under data_objects.

1 0

Plot Types and Plot Collections
by Matthew Turk 21 Aug '09

21 Aug '09

Hi everyone, After talking to Sam and Devin at PiTP and then to Dave yesterday (at UCSD, my new residence) it's become clear to me that the PlotTypes and PlotCollections files -- which are remarkably unchanged over the last 2+ years of development, since they were first implemented -- have become a bit stale and have some creeping bugs. People have worked around them, but it's time for an overhaul. I'm writing with a new plan for how to handle plots, and I'm writing because it might affect some of you. The basic idea is that plots will have valid and invalid states; invalid plots will be redrawn as necessary. Additionally, Pixelization into buffers will occur only inside FixedResolutionBuffers, and every plot will have an associated FRB. So, a plot starts out, and it sets itself up and marks itself and its colorbar as invalid. We then save it, and it sees that it's invalid, so it redraws. This marks the colorbar as invalid (even though it already is), and then the colorbar gets redrawn. Both are now valid. We then change the zlim, which marks the colorbar as invalid while the image remains valid. (Colorbar and colormap are essentially inextricable here.) So far, I think it's fine -- but the question comes in during user intervention. So we have a valid state for our plot. (The user can then mess with it however they like, and on the next save, that messing with it will still be there -- because it won't be marked as invalid -- which is a huge advantage!) When we change the width of the plot, then the image becomes invalid. But my real question is, what do we do if the user then sets the zlim and then changes the width? Set zlim: invalidate colorbar Save: revalidate both Set width: invalidate image But, what happens to the colorbar? Do we invalidate it, and then reset it on next save? That is, when we change the width, does that override the user setting the colorbar? Does anyone have any thoughts? Thanks! -Matt

3 3

Fw: Project Space on Kraken
by Stephen Skory 17 Aug '09

17 Aug '09

I forgot to forward this along. Let me know if you have problems! _______________________________________________________ sskory(a)physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ ----- Forwarded Message ---- > From: "help(a)teragrid.org" <help(a)teragrid.org> > To: sskory(a)physics.ucsd.edu > Sent: Wednesday, August 12, 2009 8:13:00 AM > Subject: Re: Project Space on Kraken > > FROM: Lucio, Daniel > (Concerning ticket No. 174658) > > Hi, > > I want to let you know that you have two project directories: > > lucio@krakenpf5(XT5):~> ls -ld /lustre/scratch/proj/yt_common > drwxr-x--- 2 sskory ytcommon 4.0K 2009-08-11 15:57 > /lustre/scratch/proj/yt_common/ > > and > > lucio@krakenpf5(XT5):~> ls -ld /nics/a/proj/yt_common > drwxr-x--- 2 sskory ytcommon 4.0K 2009-08-11 15:18 /nics/a/proj/yt_common/ > > > Please let us know in the future if you need more assistance with this. > > Best regards! > > Daniel > NICS Team > > > Stephen Skory writes: > >Hi Daniel, > > > > > >> please provide us with the following: > >> > >> - desired project directory name > >> - who or what groups of people you want to provide access to. > >> > >> Best regards! > >> > >> Daniel. > >> NICS Team > > > >Can it please be named 'yt_common'? I'd like these usernames to have write > access: > > > >sskory > >turk > >skillman > >jsoishi > >collins > > > >Thanks! > > > >Stephen Skory

1 0

Fw: Project Space on Kraken
by Stephen Skory 10 Aug '09

10 Aug '09

see attached... so what do we say? _______________________________________________________ sskory(a)physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ ----- Forwarded Message ---- > From: "help(a)teragrid.org" <help(a)teragrid.org> > To: sskory(a)physics.ucsd.edu > Sent: Friday, August 7, 2009 9:05:56 AM > Subject: Re: Project Space on Kraken > > FROM: Lucio, Daniel > (Concerning ticket No. 174658) > > Hi, > > please provide us with the following: > > - desired project directory name > - who or what groups of people you want to provide access to. > > Best regards! > > Daniel. > NICS Team > > Stephen Skory writes: > >Hi, > > > >We would like to request some project space on Kraken's lustre scratch disk. We > would like to have space to install the Enzo [1] data analysis toolkit yt [2] > that can be accessed by our colleagues. yt uses Python, which must be built > statically due to the specialized CNL kernel. It is not trivial to build Python > statically, and having this installation made permanent on lustre would make > things easier for us. > > > >We don't need very much space. A few gigabytes (10GB is more than we would ever > need) will be plenty. > > > >Thanks! > > > >Stephen Skory (on behalf of the yt developers) > > > > > >[1] http://lca.ucsd.edu/projects/enzo/wiki/Enzo1.5 > >[2] http://yt.enzotools.org/

7 7