I have just finished an extensive overhaul of the HaloProfiler that includes
a bunch of new features, as well as completely changes the running syntax.
Other than the addition of various new features, the main goals of this
1) to make the tool more general (mostly by removing hard-coded filtering by
virial quantities), essentially allowing someone to profile a random list of
points that may or may not be actual halos
Easy filtering of halos by virial quantities remains an option, but is now
simply a specific instance of something more general and far more powerful:
a halo filtering device that allow the user to create their own filter
functions to comb through radial profile data and decide whether a halo
meets certain criteria.
2) to remove the dependence on an additional parameter file for the various
The parameter file is now gone. Most of the HaloProfiler parameters have
been turned into instantiation keyword args. The keyword list is now
considerably longer, but the benefit is that the number of files needed to
run this thing has been reduced from 2 (running script and par file) to just
1 (running script). There are a large number of keywords options that are
specific to either the profile or projection routines that are taken in at
instantiation and stored as attributes. This was done to keep those
function calls simple. I'm curious to know peoples' thoughts on whether
these keyword args should stay put or move to the individual function
calls. Adding fields for profiling and projections have been moved to
functions addProfile and addProjection.
Here is a brief list of the new features that I can remember:
- the halo list read routine can be easily customized to read in columned
ascii data of varying formats through the use of a dictionary (see attribute
- ability to change the function call, args, and kwargs for the halo finder
- filter halo list with user-written filter functions (see new file
HaloFilters.py for an example of a function to filter based on virial
- extract and output scalar quantities from halos using the filter
- pre-filter halos based on values in the initial halo list (skipping the
profiling altogether and saving a load of time)
- choose the axes to be projected (instead of hard-coded to all three)
- easily select the list of halos to be projected (the total list, filtered
list, a new file, or an actual list)
Before I commit all this, I'd like a little feedback, mostly on the
migration of the par file parameters to keyword args. I put three files in
the pastebin for people to download. I chose not to submit a diff since the
changes were so sweeping.
The new HaloProfiler.py: http://paste.enzotools.org/show/178/
HaloFilters.py (should go in yt/extensions with HaloProfiler.py)
runHaloProfiler.py (an example script to run the new HaloProfiler)
I will try my best to write up full documentation for this as soon as
possible, perhaps even today.
Please let me know what you think.
I've been working in the evenings (and, today, during the day a little
bit...) on the documentation. Britton, Stephen and Jeff have also
chipped in and written a bunch.
I've uploaded a new build of the docs (which will last until the next
svn commit, at which point it'll be wiped and replaced with whatever
is in SVN currently) to here:
There are two new features I wanted to mention:
* The cookbook ( http://yt.enzotools.org/doc/cookbook/recipes.html )
is programmatically generated from the cookbook repository (
http://hg.enzotools.org/cookbook ) and images are generated
automatically as well. All other locations of examples will be wiped
and either left empty or replaced with links or content
programmatically generated from the cookbook repository. It is 100%
my fault, but we have too many scattered, sometimes broken, sometimes
outdated, undocumented examples all over the place. This is my
attempt to fix that.
* I've added comments to every page in the documentation using
disqus.com. Currently you need an OpenID (gmail supplies this) to
comment. All comments are aggregated on a single page on disqus's
site, or they can be viewed threaded-inline in the docs. This can be
expanded to accept Facebook logins, but I was hesitant on that. What
are y'all's thoughts?
Also, there are still some major blank spaces in the docs. I'm
working on filling them in, as are other people (mentioned above). If
you'd like to help out, pick out a spot, clone the yt-doc repo, add
text or images or whatever, and I'll give you push privs. (Of course,
as soon as you clone a repo, you already have commit privs. :)
Britton, Sam and Stephen have all reported to me at different times
that it seems sometimes one of the processors in a parallel job hangs
for a while then races to catch up at the end. Have any of you ever
successfully done any localization of this problem? Figuring out
where exactly it hangs? I think this would show up in per-processor
profiling, and looking to see which functions take up the most time on
processors, and disparities in that across procs. I'd *really* like
to track this down, as it's now causing us some real problems.
I've noticed several of you making commits about particles. (John,
Britton, looking at you guys.) In recent versions of Enzo, I stuck in
a dataspace supplement that pre-computes dataspaces for different
particle types in each grid, which should makes things really cheap to
read *only* star particles, or particles of a different type. I've
put in support for this in the yt-hg repo, but the biggest problem
with it was that I couldn't figure out the right way to address it --
so I'd like to put out there a question to you guys.
In an ideal world, if you have some data object:
what is the best way to be able to address particles of different
types, and only get those particles? (Let's pretend either that IO is
cheap, or we have the dataspace hack.)
data_object.particles[type]["x"] # for instance
or something? What do you think? Any off-the-wall ideas?
Devin, Sam and Adam Ginsburg have done a lot of really great work on
the colorbar handling. What I'd like to propose is that we take our
existing gigantic routine and split it into a smaller handler function
or class -- maybe something like RavenColorbar or YTColorbar. I'm
willing to do this, but before I do I wanted to clear it with
everyone. The way I am seeing it, the colorbar would be a wrapper
around the matplotlib colorbar class, and it would handle some
YT-specific stuff --
* Setting of datalabels, and knowing about units and so on
* Setting of ticks for the sometimes pathological situations in which
we find ourselves
* Setting of ticks for the completely reasonable situations in which
we also find ourselves
* Intelligently plotting into axes
* Notification handling (which is currently handled via a lambda for
MPL 0.91.x compatibility I'd like to discard)
What do you all think? I think having this as a base would be easier
for invalidation of colorbars, which is where we're moving. The
ultimate plan is to have a plot object and a colorbar object, and when
certain actions are taken on either, one or both will be marked as
"invalid" which would trigger a replot. This would be things like
resetting the width, changing the limits, etc etc.
As per Sam's request (offline) to get some info about how to use the
VTK interface, I'm writing an email to start a dialogue about this.
Right now, the VTK interface is exposed through the next version of
reason, which is stored inside this mercurial repository:
Here are the requirements:
If you install VTK, be sure to install the Python bindings. During
the cmake phase, you will have to ensure that the Python interpreter
pointed to is the same one you used to install yt. You *may* have to
manually edit the CMakeCache file to ensure that it does *NOT* install
with a "--prefix" command, which will confuse the situation. I will
speak below, briefly, about my feelings on the Raft and how it plays
into all of this. Once you have installed VTK, you should be able to
install ETS (code.enthought.com) with
$ easy_install "ETS[nonets]"
but, if not, do a manual source install. This will involve a couple
more steps -- getting ETSProjectTools, running "ets co ets" and then
doing "ets develop" in the ets directory. Alternatively, the newest
EPD (5.0) is supposedly far more stable and easy to install and get
working on OSX.
Check out the hg and update to branch 'yt'. Now execute the command:
$ python2.5 yt/reason/reason_v2.py
and you will have access to the next-gen GUI. Open up a parameter
file and you then have access to the VTK interface by right clicking
on the parameter file in question. This is a developing project that
I haven't really played with in a few months. However, it can do
isosurfaces, marching cubes, cutting planes, box outlines, camera
paths (save to a file, even, and export to Amira format) and some
other stuff. It's also stereo-enabled. With John Wise's work on a
software-volume renderer (and any future developments on that)
hopefully this will also become a gateway for setting up renderings of
datasets after prototyping. I know that Sam Skillman has also
expressed interest in using this as a platform for rendering.
Okay, so, that was kind of a pain, right? Well, that's where the Raft
came in. I began the Raft project when I saw how the FEMhub guys had
undertaken the task of getting Mayavi2 and its dependencies (Traits,
VTK, etc etc) installed in a simple, cross-platform manner. Right
now, it already installs VTK and Traits (not on OSX, but that should
change by endofthemonth according to Ondrej Certik and Prabhu
Ramachandran) and I am working on getting wxPython to install as well,
as available. If I can get wxPython to go, then we should be 100% set
up for running this next-gen GUI with embedded VTK. However, an
additional and awesome feature of the Raft (that I cannot take any
credit for) is that it includes off-screen rendering, done via Mesa.
If the VTK code I've written can be refactored to have the rendering
independent of the GUI, then this will also become a viable mechanism
for rendering. I've done this, via monkeypatching and hackery,
already with a RAFT notebook -- I made some images of a cosmology
dataset, rendered with Mesa, using VTK widgets, displayed through a
web browser. If anyone is interested I can post that worksheet.
Sam, if you run into any problems, please feel free to reply to this
message so we can figure them out together and move forward. I'm very
excited about all this, and I'd really like to bring more people on
board with developing the VTK interface and the GUI. I've been in
touch with Prabhu from the MayaVi project, and he has sent me a patch
to read in MultiBlock data in MayaVi. Currently we use the
HierarchicalBoxDataSet, which I've modified the patch to support as
well. I'd like to continue working on that project, but I might need
some help in that department from other people -- but if we can get
our data into MayaVi, that would be a huge step forward in usability
So I gave my first YT tutorial using the RAFT today. I ran the RAFT
on Triton and accessed it locally; everything "just worked" and we
were very productive.
There are some issues that still need to be investigated and examined,
which I am including here. I'd appreciate it if anyone who has used
this could contribute their thoughts.
* Currently, not all the widgets for interaction inside the raft do
not work with just Raft. They have some deeper Sage dependencies.
I've spoken with people at the Sage project, and they are going to be
working on rewriting the notebook engine at the end of this month,
which will fix this issue.
* Printing PDFs of the notebook lets images jut off the edge of the
page, which ends up cropping them.
* MPI jobs can't currently be dispatched. It's not clear to me, but
I personally believe this can be overcome and that we will be able to
launch parallel processes from within the RAFT. Launching them from a
queuing system might not be possible, but on a single machine I think
it'll be possible.
* Currently everything is stored in ~/.spd, but I'd like to change
that to ~/.raft.
* Distributing worksheets and converting down to Python code is a
bit sticky at the moment, but I'm going to investigate ways to make
* I've asked around a bit about uploading worksheets to a repo and
then downloading them remotely. If this requires little to no effort
to set up on the server end -- i.e., turnkey solution -- I'll write up
something to upload from the notebook interface. (If such a thing
doesn't already exist, and it might!)
Does anyone else have anything to offer? Other issues or sticky bits
that I have overlooked?
As far as releases go, updating the spkgs and builds is pretty easy --
I am wondering if maybe we should just aim for a monthly release
schedule for RAFT, rather than version numbers.
We now have builds at http://raft.enzotools.org/files/ for the
As some of you know, I've been working on a fork/port/rebranding of
the FEMhub(dot org) framework. FEMhub itself is a fork of SAGE, which
you can find at sagemath.org. Basically, SAGE and FEMhub are designed
to be fully self-contained, web-based interfaces to finite element
computation systems (FEMhub) and symbolic math systems (SAGE). Both
are pretty amazing, and SAGE itself is a strong competitor with Maple
With that out of the way, I'd like to take a moment to focus on a
couple aspects of this. I saw this talk on FEMhub at SciPy 2009 by
Ondrej Certik (more from Ondrej here:
where he described the development of FEMhub from a project called SPD
(source python distribution.) These projects started as stripped down
versions of SAGE, but now serve as generic Python installation
mechanisms. The upside of this is that these projects can easily
install complete python distributions, there is a large selection of
existent packages available, and they even can (optionally) present
web-interfaces to the complete python installation. I'll return to
the idea of the web-interface in a moment, because at first I didn't
really care about that but I have since come to. FEMhub actually has
created and includes off-screen rendering mechanisms for VTK and
OpenGL which work through the web notebook, which I think is pretty
neat. However, you don't *have* to use it through the web -- all the
python and everything you're familiar with works out of the box, and
(best yet) it handles all the setting of library-loading and
So here's what's up: I've taken FEMhub and rebranded it with a new
name, RAFT. I've stripped out a bunch of the finite element codes and
added in HDF5, h5py, and yt. In doing so, I believe I have completely
replicated *and supplemented* the installation script, using a
framework that has been externally developed and which continues to be
developed externally. My installation scripts pale in comparison to
the quality control that the SAGE project demands -- their system is
much more robust than what we use, and it includes more packages. The
RAFT project now benefits from this QA effort.
In the past, we've avoided including any code in yt that even *uses*
scipy, simply because scipy is traditionally so difficult to install.
FEMhub/SAGE/RAFT simply *work*, and they come with SciPy.
Additionally, they come with the entire MayaVi distribution, which is
not only able to render, but to render off-screen the need be. We get
a lot of fun stuff with this, and it works on a ton of different
systems without any hackery on the part of the user.
Now, if I can speak for a moment about the web notebook. This is
something I still haven't figured out my feelings on, but I can tell
you all this: it works seamlessly with yt with no modifications, which
is more than I can say for other web-based interaction mechanisms I
have tried. I was able to import yt, load a data file, add plots, and
then whenever "save" is called they just show up in the web browser.
Additionally, because it binds only to localhost and contains
authentication mechanisms, this is accessible through ssh forwarding
and should be considered feasible for running at supercomputer
Okay. So. I think the modifications all work, and I've rebranded
much of the Notebook. I have a couple more decisions to make on the
back end about package distribution.
* Do we install VTK/MayaVi by default? This adds some compilation
time. But, it also enables us to do 3D rendering more easily. I'm
inclined to say no, because it can be automatically
downloaded/installed by the raft command.
* Do we completely replicate the SAGE spkg distribution mechanism on
enzotools.org? This would enable us to have finer-grained control as
well as to distribute personal packages, but we'd also have to stay up
to date with upstream bug fixes *manually*.
* What role should the notebook play? I was immediately and
extremely opposed to this when I first saw it; but I have since warmed
to it considerably, because of the ease-of-development, inline help,
repeatable analysis, export/import capabilities and PDF output. It
seems likely to me that this would be a very easy mechanism for new
users, and we can distribute tutorial sheets *with* the RAFT, which
users can step through on their own to learn both Python and yt.
* Does anyone else want to help keep up with changes upstream, help
with rebranding, and all-in-all test this out?
SO! On to the testing part.
You can test the RAFT 0.1 alpha distribution by snagging it here:
You should ensure that no YT files are in your PATH or LD_LIBRARY_PATH
-- we're doing this all fresh. Then,
$ tar xvfj raft-0.1a.tar.bz2
$ cd raft-0.1a
and you should get an install, a little bit later. On triton it was
pretty fast. You can now set your PATH to include
/wherever/raft-0.1a/local/bin and when you run python2.6 you should
get a yt installation. Now, you should be able to run anything you
like. Running the command "raft" inside the raft-0.1a directory will
give you a modified IPython. It might be fun to start that up, and
where [something] is some unused port. I like 7741. On your local
machine, you can then type:
$ ssh -L 8000:localhost:7741 remote-host.org
and when you open up a web browser and point it to localhost:8000,
you'll get your notebook interface. It's authenticated, so you'll
need a u/pw combo, but the first time you start the notebook it'll
prompt you. You can then add plots or whatever and anaylze any data
accessible to you on that system. When a new PNG file is added (i.e.,
via the save() command on a PlotCollection) it will get included in
the cell directly. Be sure to ctrl-c out when you're done. The
"raft" command is pretty powerful, and you should also be able to
install things with it.
Okay, that's a long enough email. I'm going to work on this a little
bit more, but I'd *REALLY* appreciate it if you guys could just try
installing and using it on a machine or two that you have access to.
I intend for this to replace the installation script, but until it's
definitely working, we can't really do that. :)
There seems to be a bug introduced in r1396 when making phase objects
with the plot collection. This was tricky to find because it only
appears if you remove the build directory (and then run a python
setup.py install) in yt-trunk-svn/ as well remove any old .yt files
for datasets you're analyzing. However, if you do so, any
phase_object plots show up empty for r1396-current. It seems to be
the changes in yt/raven/PlotTypes.py even though the profile plots are
RavenPlots and not VMPlots. Anyone have any ideas on how a change to
VMPlot could carry over to ProfilePlot?
You can use yt-trunk-svn/test/DD0010/moving7_0010 as a test. Here is
the script I use for making the phase plot:
from yt.mods import *
pf = EnzoStaticOutput('moving7_0010')
pc = PlotCollection(pf, [0.5]*3)
wb = pf.h.region([0.5]*3, [0.0]*3, [1.0]*3)
Samuel W. Skillman
DOE Computational Science Graduate Fellow
Center for Astrophysics and Space Astronomy
University of Colorado at Boulder
The rewritten hierarchy now passes all the unit tests. You can
inspect this branch in this repository:
in the named branch hierarchy-opt . The Orion hierarchy will need to
be converted (Jeff and I can talk about this offline) but overall I
think the new mechanism for instantiating and creating hierarchies is
now extremely straightforward, and will suit itself better to new AMR
formats. You can see the outline here:
This is the base class, and other hierarchies just have to implement
those functions to be real. The enzo hierarchy instantiation should
be marginally faster and might use a bit less RAM. However, I still
need to work on both reorganizing the DataQueue and the Grid objects,
which will become smaller (eliminating some data members) and faster.
I intend to eliminate at least one dictionary in the grid object.
(Each dictionary removal drops 1K per object -- for the L7, this would
be 300 megs per dict.)
If you'd like to take a look, please do so, and let me know how it
works out. You can do this with:
hg clone http://hg.enzotools.org/yt/ ./experimental-yt
hg up -C hierarchy-opt