As I noted in my other email, there was one major feature we'd talked
about for yt 2.5 that we never even really worked on. Nathan and
others pushed really hard on rethinking and redesigning the way image
plots were made and handled back in the 2.4 series, which eliminated
the main use case for the PlotCollection. However, the remaining
firewalls on PlotCollection are the 1-D and 2-D phase plots.
There's been broad consensus that we need a new method for doing this,
analogous to how PlotWindow serves to replace "add_slice" and
"add_projection" from the PlotCollection. Something that gets out of
the way and lets people modify their plots as they see fit.
A while back I implemented a first pass at this, which you can see in
The idea here is that you create an instance of ProfilePlotter or
PhasePlotter, which then "Does the right thing" and creates the
necessary data objects and the like. This object can then either
create its own axes+figure in matplotlib, or it can deposit itself
into an existing axes object. It includes axis objects and plot
containers that know how to plot themselves.
I don't know that I necessarily like how this is done. It's very
declarative, and step by step, but I think it could be easier. Here's
One thing I think *is* quite nice is that the *plot* object is
independent of the profile itself. This makes it easier to pickle and
unpickle things, and is the reason for the existence of the various
sub-objects off of ProfilePlotter. This is useful for the use case of
making very large datasets into profiles, pickling the resulting
plots, and modifying them later.
So here are my specific questions:
* Does this rough outline of how the object works seem good? As in,
making a plotter object, making it have a sub .plot object, and then
dispatching that plot object to various backends? (Probably nearly
* Should we add on additional convenience operations?
* How should it interface with existing profiles?
* Should implementing this and finalizing the design be a blocker for 2.5?
I will create and update a YTEP with the resulting discussion from
this. Nathan, Jeff and Britton -- I would very much appreciate your
feedback, as I know you have experience with (respectively) plot
windows, matplotlib, and large datasets being profiled.
I've got a problem I'm trying to solve regarding HDF5, and I've found it's specifically related to yt.
This sequence works:
f = h5py.open(filename, "r")
do stuff with f
f = h5py.open(filename, "r+")
However, this sequence does not:
pf = load(filename)
do stuff with the pf
f = h5py.open(filename, "r+")
It results in this:
NameError Traceback (most recent call last)
<ipython-input-8-186261e4952a> in <module>()
----> 1 f = h5py.File("desktop_stuff/GasSloshing/GasSloshing/sloshing_nomag2_hdf5_plt_cnt_0150", "r+")
NameError: name 'h5py' is not defined
In : import h5py
In : f = h5py.File("desktop_stuff/GasSloshing/GasSloshing/sloshing_nomag2_hdf5_plt_cnt_0150", "r+")
IOError Traceback (most recent call last)
<ipython-input-10-186261e4952a> in <module>()
----> 1 f = h5py.File("desktop_stuff/GasSloshing/GasSloshing/sloshing_nomag2_hdf5_plt_cnt_0150", "r+")
/Users/jzuhone/yt-x86_64/lib/python2.7/site-packages/h5py/_hl/files.pyc in __init__(self, name, mode, driver, libver, **kwds)
149 fapl = make_fapl(driver,libver,**kwds)
--> 150 fid = make_fid(name, mode, fapl)
151 Group.__init__(self, fid)
152 self._shared.lcpl = make_lcpl()
/Users/jzuhone/yt-x86_64/lib/python2.7/site-packages/h5py/_hl/files.pyc in make_fid(name, mode, plist)
45 fid = h5f.open(name, h5f.ACC_RDONLY, fapl=plist)
46 elif mode == 'r+':
---> 47 fid = h5f.open(name, h5f.ACC_RDWR, fapl=plist)
48 elif mode == 'w-':
49 fid = h5f.create(name, h5f.ACC_EXCL, fapl=plist)
/Users/jzuhone/yt-x86_64/lib/python2.7/site-packages/h5py/h5f.so in h5py.h5f.open (h5py/h5f.c:1618)()
IOError: unable to open file (File accessability: Unable to open file)
Anyone have any ideas?
I've been doing some benchmarking of various operations in the Enzo
frontend in yt 2.x. I don't believe other frontends suffer from this,
for the main reason that they're all 64 bit everywhere.
The test dataset is about ten gigs, with a bunch of grids. I'm
extracting a surface, which means from a practical standpoint that I'm
filling ghost zones for every grid inside the region of interest.
There are many places in yt that we either upcast to 64-bit floats or
that we assume 64-bits. Basically, nearly all yt-defined Cython or C
operations assume 64-bit floats.
There's a large quantity of Enzo data out there that is float32 on
disk, which gets passed into yt, where it gets handed around until it
is upcast. There are two problems here: 1) We have a tendency to use
"astype" instead of "asarray", which means the data is *always*
duplicated. 2) We often do this repeatedly for the same set of grid
data; nowhere is this more true than when generating ghost zones.
So for the dataset I've been working on, ghost zones are a really
intense prospect. And the call to .astype("float64") actually
completely dominated the operation. This comes from both copying the
data, as well as casting. I found two different solutions.
The original code:
g_fields = [grid[field].astype("float64") for field in fields]
This is bad even if you're using float64 data types, since it will
always copy. So it has to go. The total runtime for this dataset was
160s, and the most-expensive function was "astype" at 53 seconds.
So as a first step, I inserted a cast to "float64" if the dtype of an
array inside the Enzo IO system was "float32". This way, all arrays
were upcast automatically. This led me to see zero performance
improvement. So I checked further and saw the "always copy" bit in
astype, which I was ignorant of. This option:
g_fields = [np.asarray(grid[field], "float64") for field in fields]
is much faster, and saves a bunch of time. But 7 seconds is still
spent inside "np.array", and total runtime is 107.5 seconds. This
option is the fasted:
g_fields = 
for field in fields:
gf = grid[field]
if gf.dtype != "float64": gf = gf.astype("float64")
and now total runtime is 95.6 seconds, with the dominant cost *still*
in _get_data_from_grid. At this point I am much more happy with the
performance, although still quite disappointed, and I'll be doing
line-by-line next to figure out any more micro-optimizations.
Now, the change to _get_data_from_grid *itself* will greatly impact
performance for 64-bit datasets. But also updating the io.py to
upcast-on-read datasets that are 32-bit will help speed things up
considerably for 32-bit datasets as well. The downside is that it
will be difficult to get back raw, unmodified 32-bit data from the
grids, rather than 32-bit data that has been cast to 64-bits.
Is this an okay change to make?
A lot of really great work has already or is just about to be been
merged into yt since we did our last release in August. Just a few
Unit Tests and improved answer tests with continuous integration
Numerous PlotWindow fixes and improvements
Improved Athena support
spherical and cylindrical coordinates and vector coordinate transformations
Particle support for FLASH data
Limited support for generating GDF initial conditions
Improved support for 2D FLASH data.
PlotWindow TimeSeries interface
Numerous bug fixes
The obvious holdup to a release is the number of open issues marked for
2.5. I think the idea was to make the 2.5 release the last in the 2.X
series before moving all development over to yt 3.0. Is that still the
plan? Is it possible to get a 2.5 release out while delaying the end of
2.X development to a 2.6 release?
The reason I bring all this up is mostly for the plot window fixes -
we've improved it a lot since August and I think it's much more usable
now. Perhaps a new release and a blog post about the new plotting
interface will encourage more people to switch over.
I'm curious what everyone thinks about this. I know many of you are
busy with the enzo 2.2 release and probably don't want to make another
big push so soon.
Try something like this script:
Which produces these images:
Basically I generate particle x and y positions to be fed into the uniform grid field data, and I define a "ParticleNumberDensity" field that histograms this data. The grid has dimensions of (64,64,1).
Notice I defined the bounding box 'bbox' oddly--it's because I wanted all of the cells to be cubical. You could probably just get away with defining the bbox as (0,1) along all three axes, but this would mean the cells are long and rectangular along the z-axis. I'm not sure how yt would handle this (my guess is just fine, but I've seen code that seems to assume cubical cells).
On Dec 1, 2012, at 6:33 PM, Christopher Moody <cemoody(a)ucsc.edu> wrote:
> Thanks John!
> On Sat, Dec 1, 2012 at 12:18 PM, John ZuHone <jzuhone(a)gmail.com> wrote:
> You could probably use the uniform grid stream for what you want. You could just flatten the particle position along the axis you're projecting along and then define a 2D uniform grid with shape, say, (nx, ny, 1) if you were projecting along Z. Then you could use the normal yt stuff to make plots, but the particles would only be on one grid.
> When I'm able to get back to a real keyboard I will write an example script if this sounds good to you.
> John ZuHone
> Laboratory for High-Energy Astrophysics
> NASA/Goddard Space Flight Center
> 8800 Greenbelt Rd., Code 662
> Greenbelt, MD 20771
> (w) 301-286-2531
> (m) 773-758-0172
> On Dec 1, 2012, at 2:57 PM, Christopher Moody <cemoody(a)ucsc.edu> wrote:
>> Hi everyone,
>> I'd like to implement a particle density plot type, but I'm not familiar enough with the yt/visualization classes. I'm explicitly avoiding particle deposition on the grid (ie, the stuff John ZuHone just PR'd), because even after the particles are assigned to grids, splitting the array for each grid, and then concatenating grid-by-grid is slow. And moving to yt-3.0, we should be able to handle particle as particles not on a grid/oct (right?).
>> To collect all of the particles, project them, and do a 2D histogram is straightforward, but my question is where to do it. I was thinking of imitating the ParticlePlot class, but that basically uses the particle callback that draws a dot for every particle. A histogram'd field is defined everywhere, and is probably not appropriate as a callback.
>> How should I structure this? Which files are the relevant ones?
>> yt-dev mailing list
> yt-dev mailing list
I'd like to implement a particle density plot type, but I'm not familiar
enough with the yt/visualization classes. I'm explicitly avoiding particle
deposition on the grid (ie, the stuff John ZuHone just PR'd), because even
after the particles are assigned to grids, splitting the array for each
grid, and then concatenating grid-by-grid is slow. And moving to yt-3.0, we
should be able to handle particle as particles not on a grid/oct (right?).
To collect all of the particles, project them, and do a 2D histogram is
straightforward, but my question is where to do it. I was thinking of
imitating the ParticlePlot class, but that basically uses the particle
callback that draws a dot for every particle. A histogram'd field is
defined everywhere, and is probably not appropriate as a callback.
How should I structure this? Which files are the relevant ones?
Am I missing something, or is editing the yt website simply a matter of
hand-editing html? It's totally fine if that's what it is, I just can't
seem to find any discussion of it on the mailing list. Sorry if this is an
obvious/previously answered question.
Nov. 30, 2012
We are proud to announce the public release of Enzo version 2.2. This
intermediate release contains several new physics modules, reorganized and
updated documentation, revamped answer testing, improvements to code
infrastructure, as well as numerous bug fixes. In a little bit more detail:
1. New physics capabilities — (a) Thermal and chemical feedback from Type
Ia supernovae, with mass loss and luminosity fits provided by K. Nagamine;
(b) H2-regulated star formation, in which the star formation rate is
proportional to the density of molecular hydrogen (determined from
Krumholz, McKee, & Tumlinson 2009), rather than total gas density.
2. Upgrades to the Enzo Answer Testing Framework — Includes both functional
improvements and vastly improved documentation. The results of two test
suites ("quick" and "pull") have been uploaded to an Amazon Cloud instance,
allowing users to compare to an Enzo Community "gold standard".
Alternatively, answer testing can also be done by comparing to local
3. Improved Documentation — (a) Descriptions of 175 previously undocumented
parameters; (b) Parameters have been broken up into groups; (c) Brief
explanations of the functionality provided by (most) Enzo source files; (d)
Description of the new Enzo performance timers; (e) Examples of how to
4. Code infrastructure improvement — All Fortran routines have been
modified to explicitly specify the type at compile time, eliminating the
need for compiler flags to ensure 64-bit precision.
5. Timing and performance measurement and plotting tools — Support for
simple, lightweight measurements for the timing and performance of Enzo.
This allows one to examine which functions are using the majority of the
simulation runtime, and how this varies across multiple processors.
6. Bug Fixes - see the Enzo 2.2 Changelog at
With this release we continue our efforts to consolidate The Enzo Project's
web presence, with the goal of lowering the barrier to entry for new users
and easing further code development:
* Enzo's main webpage continues to be http://enzo-project.org. Links and
information can be found here for both developers and general users.
* Both stable (https://bitbucket.org/enzo/enzo-stable) and development (
https://bitbucket.org/enzo/enzo-dev) versions of the Enzo code are now
hosted on Bitbucket.
* For news and updates please follow our google+ page: The Enzo Project (
The Enzo Development Team
* Dr. Michael Kuhlen Theoretical Astrophysics Center *
* email: mqk(a)astro.berkeley.edu UC Berkeley *
* cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 *
* skype username: mikekuhlen Berkeley, CA 94720 *