New issue 779: Iterating over a time series is substantially slower in unitrefactor
This came up when I noticed time series iteration seemed slow. The script I'm using to test this is here:
It loops over a list of enzo outputs, finds an active particle, and prints out its mass.
I've attached the raw cProfile traces to this issue and have uploaded pyprof2html visualizations of the data to my website:
It looks like even though field detection is substantially faster using the new field system, the unit subsystem is still so slow that it dominates the total runtime.
I've provisionally assigned this to myself in hopes that I'll be able to optimize things using this test dataset. Let me know if you'd like to take a crack at this before I can get to it.
I want to raise the possibility of removing the PlotCollection and
Reason code from yt before the 3.0 release.
I haven't checked recently but given the changes in yt since they were
last updated, both features likely do not work correctly right now.
Given the move to notebooks* and PlotWindow-style plotting going
forward, I think now is the time to rip off the bandage and remove the
It's likely some of the gui code can be repurposed for a widget, but
that doesn't mean reason as it is right now needs to be bundled with
the release or exist in the dev repo.
in the release candidate for IPython 2.0. Since the widgets
communicate with a live kernel, you unfortunately can't demo it with a
static nbviewer page. Look in the examples folder in the root of the
We are proud to officially announce the upcoming 2014 yt developer
workshop. This three day event, to be held on the campus of the
University of California, Santa Cruz on Sunday March 23rd through
Tuesday March 25th, will bring together a diverse group of students,
researchers, and developers.
While past workshops have been broad in scope, this workshop will be
focused on advancing the goals of yt 3.0 in tandem with the goals of
the AGORA project. The four primary goals of this workshop will be to
advance the state of the code for:
* GPU-based rendering, integrating the PyRGBA toolkit developed at UCSC
* Improve scalability of particle and octree datasets in yt
* Streamline and optimize SPH smoothing kernel implementations
* Integrate more tightly with halo finders and develop an API for SAM interop
Through UC-HIPACC we are able to offer attendance funding for
participants, including travel and lodging. Funds will be
preferentially distributed to students and the level of individual
support will depend on demand.
If you are interested in attending, please fill out this very short
form, and we will be in touch shortly with more details:
If you have questions or concerns about the workshop, please feel free
to contact the organizers at workshop2014(a)yt-project.org.
On behalf of the organizing committee,
I've done a lot of thinking and talking with people about the idea of
merging the units stuff into the mainline yt 3.0 branch.
There are clear advantages to doing this: people who want to use SPH
smoothing would be able to get it from the primary repository, PRs
could be done through that repository, and the access to new things
would be considerably easier. More public development and review
could happen; while the development already *is* public, it's out of
view in my fork of yt. This is not productive.
But the development of yt is not the point of yt. Using yt to enable
scientific discovery is the point of yt.
In many ways, the units refactor will enable more scientific
discovery. But it's not ready. There are people using yt-3.0
*already* (prime example: http://nickolas1.com/d3test2/ ) to do really
cool science in ways that they can't with 2.x. And they're doing this
with a yt that *mostly* works like the 2.x branch, with the same field
names and units and all of that, so the docs *mostly* apply.
The units refactor, if merged in, would pull the rug *completely* out
from under them. And there's no safety net. There's a web of YTEPs
and PR comments and notebooks posted to mailing lists, but there's no
place they can go and see, "Hey, this worked before, why isn't it
now?" And that's not okay.
I've long put off writing documentation, and honestly, I could come up
with lots more reasons to put it off. But I started on Wednesday
actually writing things down in earnest, and I think that needs to be
the next big push, which I am committed to doing. Yeah, it's not that
fun always. Especially since things *are* still changing. But it's
not fair -- and it is certainly not in the spirit of *extreme empathy*
-- to just change things.
But I also want new development to continue. And so I want a balance
to be struck. I'd like to enumerate the items that are necessary for
documentation so that we can merge it in. I think these are as
* All notebooks should be ported to the 3.0 docs and unit-refactor style
* API documentation has to be able to be compiled
* At a *bare* minimum, a list of stumbling blocks has to be included
for moving to 3.0. Britton and I have started on this and made very
* We need a bookmark or tag to be included in the repo *pre*-refactor.
* Cookbook recipes must work (I think they mostly do now)
Things I don't think we need to do before merging:
* Completely update 100% of the narrative docs
* Document how to add smoothing fields, as I believe this API is in flux
* Describe the underlying methods in great, extensive detail for the
* A full, complete review of the docs like we did in advance of 2.6
As a thought, why don't we treat documentation the way we treat code?
Within the project, it seems we're comfortable committing and
submitting work-in-progress code, but not docs. In the past, perhaps
this was because the PRs and repos were separate. They aren't
How does this proposal for the merge sound? Please render an opinion,
as I'd like to have this settled before the early part of next week.
New issue 778: Progressbar can be a bit heavy in the notebook
I think the progressbar, particularly for lightweight operations, can be a bit heavy in the notebook. Additionally, because of how widgets work, it can leave a *lot* of data in the notebook itself once it has concluded. To see this in action, the time series calculations in the bootcamp are a good example.
In the past, we've put a minimum threshold on the number of items (`maxval`) that are required before a pbar gets created. We may want to revisit this now, or possibly even just eliminate the progress bar from the hierarchy parsing step for grid geometries like Enzo.
I have data which is in spherical polar coordinates, so basically rho(r,
theta, phi) and I can't seem to find an appropriate interface in yt to load
the data with. Is there a function that can do that?
If not, is there a function that takes in as an argument the (x,y,z)
coordinate for each data point that I want to plot? i.e. take in as an
argument, a 3D array which is the full grid. The current interface only
seems to take in the bounding box.
New issue 776: Getting things into units is not easy
We should have an easier way to get things into units.
This currently fails (which is an actual bug):
UnitFulObject - 1.0 * UnitFulObject.units
But in general, we need an easier way to manage times where the units are tricky, or nasty, or whatever. It would be nice to have something *like* dimensionless that is a "passthrough unit." Like an "any_unit." So I should be able to do:
UnitFulObject - 1.0 * any_unit
and the system will see any_unit, convert to the units of the `other_array` and then pass through. This would get around some of the issues that we run into in many instances.
I've updated the metallicity units PR:
I ended up deciding that metallicity doesn't really make much sense as
a dimension in the unit system. Instead, I've made it so the
`code_metallicity` and `Zsun` units are dimensionless.
Here's an example run using the tip of the PR which illustrates why
this is a nice choice:
First, load the data:
>>> pf = load('HiResIsolatedGalaxy/DD0044/DD0044')
>>> dd = pf.h.all_data()
Get the mean metal mass for all particles in the simulation:
>>> mean_metal_mass = (dd['metallicity_fraction']*dd['particle_mass']).mean()
>>> print mean_metal_mass
>>> print mean_metal_mass.in_cgs()
If I had run this using a version with a metallicity dimension, I
would have had to coerce the units to be grams, even though code
metallicity is really just metallicity fraction. I think getting a
metal mass or density like this will be an extremely common operation,
and it's something that a metallicity unit needs to handle naturally.
Here's another example where I get the mean metallicity of all
particles in the simulation:
>>> mean_metallicity = dd['metallicity_fraction'].mean()
In principle code_metallicity can have a non-unit conversion to a
dimensionless fraction, but enzo writes out particle metallicities as
mass fractions, so that is unnecessary here.
As another concrete example why this way is better, here is the way I
needed to define the metal_density field for a Tipsy SPH dataset:
@derived_field(name = "metal_density", units = "g/cm**3")
def metal_density(field, data):
ret = data['metallicity_fraction']*data['density']
ret = np.array(ret)
return data.pf.arr(ret, input_units='g/cm**3')
If metallicity weren't a base dimension, I wouldn't need to coerce the
units to g/cm**3, the conversion to CGS would handle that
Hope that explains my position clearly. I'd love to hear comments,
concerns, or alternate approaches - this is definitely tricky.
I'm curious if there might be a way to speed up isocontour flux
extraction for my dataset.
Profiling my isocontour extraction script, it seems about 2/3 of the
time is spent generating ghost zones, with the rest in
This dataset might be suboptimal for this sort of thing as there is a
large number of relatively small grids, so any operation that loops
over grids or generates ghost zones is going to be a bit slow.
Would generating a covering grid and then doing isocontour extraction
on the unigrid dataset be faster? My dataset is nested AMR and most
of the structure that I'm interested in is near the maximum refinement
level. Perhaps downsampling the data a bit will also help.
Is there anything else I might try to speed things up?
Thanks for any advice you might have,