A few of us (Austin Gilbert, Penny Qian, Matt Turk, John Zuhone, and
myself) had a discussion by email about possible avenues for collaboration
between glue and yt, and we have decided it would make more sense to
discuss this in the open.
To get started, I've included copies of the relevant parts of the emails
below (oldest at the top). Please feel free to chime in!
We're also planning to have a Google Hangout to discuss this - I've created
a Doodle, which you can fill out if you want to join the discussion:
In addition to this list, we'll be using the yt project slack site (with a
special channel for glue) to discuss some of these ideas day-to-day, so
just reply to me off-list if you want to be included in the slack channel,
and I'll pass on the requests to the yt team.
I have heard through the yt project network that you are looking into
making it compatible with glueviz. As a big fan of user interfaces for data
interaction, this is really exciting to me, and I would like to be as
helpful as possible. Could you tell me what work is needed in order to make
yt fully functional with glue? Aside from existing widgets in the
framework, are you planning on adding some specific to yt? Let me know,
because I am eager to help and to get this up and running.
Many thanks for the exciting initiative for the collaboration! Regarding
your first question, honestly it is not super clear to me what exactly
should we do to interface yt and glue. Given that Glue is featured with a
GUI, while yt is a powerful data analysis and visualization package. Could
we probably explore the following possibilities:
- Using Glue to free astronomers from writing python code: load data
into glue, do the analysis in yt (develop a GUI panel for the options if
needed), then render the analyzed data with the existing widgets in Glue;
- Extend the data visualization capability of Glue: it is well known
that yt has implement a lot of fancy visualizations. Could we load the data
with Glue’s GUI (the data might actually be handled by yt), do the analysis
in yt (with a GUI panel in Glue if needed), render the data in yt, and then
present the data in Glue as a widget?
- If we develop a yt widget in Glue, we could implement the linked-view
functionality in this widget, allowing the users to manipulate the data
visually in Glue and then carry out the data analysis back in yt.
As for the second question, were you talking about adding some specific
widget into the Glue framework, so that Glue can render the data from yt?
Thanks Austin and Penny for getting the conversation started on a yt <->
glue collaboration! :)
One of the places where I think that it would be great to collaborate
between yt and glue is to develop an abstract data layer in glue that
better separates data access and computation from the interactive
visualization, and leverage yt as a data access and computation layers.
I'll describe a little what I mean by this below.
One of the main issues with glue currently is that it is:
- not well suited to deal with large datasets in general
- not well suited to dealing with non-regular cartesian data (even if not
Currently, glue loads data into Data classes, and viewers then access the
data directly and do computations (for example calculate a histogram of all
values). Calculating what sections of datasets fall inside subsets is also
done outside of the data objects and is not done in a 'smart' way in that
all the data has to be accessed, and the entire subset computed straight
In practice, what this means is that we have a FITS reader that understands
memory mapping, but as soon as you do something like compute a histogram of
all pixel values, all the data has to be read, and you lose the ability to
deal with large files. Similarly, if the user makes a selection in the
cube, often the whole cube has to be read in to determine which pixels have
A better mechanism would be to develop what I refer to as an abstract
data/computation layer, which means that we define an API that any data
object needs to have for data access, but also include things like
computation of things like fixed resolution buffers, or selection of
subsets. The idea would be that one can then implement a much wider variety
of data objects - for example a data object that would behind the scenes be
powered by yt, but also a data object that actually communicates with a
remote computer cluster on which the data is stored.
The interactive visualization part of glue would then not need to worry
about the details of the data access - it would essentially say 'I need a
fixed resolution buffer with these dimensions', or 'I need a histogram',
and this would be delegated to the data object.
Of course, yt is perfectly suited to this since it *already* provides a
data abstraction layer - so this would be a matter of defining an API for
glue data objects, then writing a wrapper for yt. In future, one could even
imaging running glue on a laptop, and having a data object that
communicates with a cluster that is running yt.
The end result would be that researchers could *load up a large simulation
in glue and be able to do the kind of linked data visualization that glue
can normally do*, which I think would be extremely powerful.
Of course, related to what Penny said, I think there are a couple of other
avenues for collaboration:
- When using the 3D viewers, we could have an 'export to yt' option which
provides a yt script to produce a production-quality 3D visualization (the
VisPy viewers we have look ok but I don't think the static output from
these is anywhere near as nice as what yt can do). This would simply be a
matter of writing a plugin for a yt exporter.
- It would be fun to investigate the new yt OpenGL rendering and see how
this compares to what we currently use (VisPy), and potentially develop a
new viewer based on the yt OpenGL renderer.
I think it would be great to discuss all of these ideas, and would like to
suggest that we have a Google Hangout in the short term. I'll send out
another email with a link to a Doodle poll!
I think a google hangout would be a great way to get started and ensure we
have a unified plan. Additionally, we should go ahead and move this to a
public email list. I would also like to recommend a Slack channel for day
to day communications; the yt community has been using it for a while now
to great effect.
In regards to the data abstraction layer you have described, I think that
YT is definitely well suited to working with data objects and selecting
regions of data in the case of large file sets. The data objects currently
supported in YT enable smart file reading: when you create a subregion of
data, only that data is read from disk, so very large datasets are not
entirely un-manageable Additionally, YT has a wide number of frontends for
different data formats so incorporating it into data objects could enable a
whole new community to utilize glue. I think YT could accomplish what you
are thinking and glue can accomplish what I'm thinking.
On the user side, I want to make sure that if you incorporate YT, YT users
still get the capabilities of the program they are used to working with
alongside the linking capabilities and user interface that glue provides.
For me this primarily looks like ensuring glue has the ability to utilize
YT's standard plotting measures in some form of widget. I also like the
idea of including the opengl features that yt can offer.
I will certainly let others in the yt community know about the hangout to
discuss what could happen.
smoothed_covering_grid() seems to be doing an unexpected transformation on the data.
See the attached images, one is created using “covering_grid()” and the other with “smoothed_covering_grid()”; that is the only difference.
This doesn’t make sense to me, but… Here, I’m looking at the data attribute “HII_Density”. I do not have issues with “smoothed_covering_grid()” when I render any other attribute (HI_Density, Metal_Density, Temperature, etc)
Greg and I found a bug involving halo catalog unit handling:
> 0.495074982741 cm
> >>> halo.quantities['particle_position_x'].in_units('code_length')
> 0.495074982741 code_length
> >>> halo.quantities['particle_position_x'].in_units('cm')
> 0.495074982741 cm
> >>> ds.unit_registry['code_length']
> (9.195880139956267e+25, (length))
> >>> halos_ds.unit_registry['code_length']
> (1.0, (length))
The halos_ds mixes up cm and code_length units when the HaloCatalog object
is created from a saved halo catalog. The halo catalog values are saved in
code_length, but the HaloCatalog object assumes they are in cm.
is where the code_length units are written out after halo-finding is done
(this is confirmed with an h5ls).
Here is where the halos_ds
the HaloCatalog is created. The length unit is set in cm -- the catalog is
assumed to be in cgs.
The HaloCatalog fields
also assume cgs.
In theory, the HaloCatalog could just parse the code_length units of the
halos_ds, but this isn't necessarily known at the time of creation, so the
ideal fix may be to save the halo catalog length units in cm instead of in
code_length. Then the assumptions that are made about length being in cm
when creating a HaloCatalog object from a halo catalog would be correct.
Any thoughts on this approach or other approaches?
I'm running into more issues with Halo Catalogs and units. I'm not sure if
the last PR <https://bitbucket.org/yt_analysis/yt/pull-requests/2208/> is
causing this or not. I'm running the latest code from the repo as of right
now (fd8796c8e06d). Here's the error, generated on this data
<https://drive.google.com/open?id=0BwK-7Z3S5X_yMDNyZ1RTcG56SFU> with this
> RuntimeWarning: divide by zero encountered in divide
> return super(YTArray, self).__div__(ro)
> Traceback (most recent call last):
> File "/work/03330/tg826294/applications/scripts/findhalos.py", line 65,
> in <module>
> line 335, in create
> self._run(save_halos, save_catalog, njobs=njobs, dynamic=dynamic)
> line 302, in barrierize
> return func(*args, **kwargs)
> line 427, in _run
> line 60, in __call__
> self.function(halo, *self.args, **self.kwargs)
> line 571, in iterative_center_of_mass
> sphere = halo.halo_catalog.data_ds.sphere(center_orig,
> line 649, in __init__
> if radius < self.index.get_smallest_dx():
> line 1095, in __lt__
> return super(YTArray, self).__lt__(oth)
> line 1225, in __array_wrap__
> raise YTUfuncUnitError(context, unit1, unit2)
> yt.utilities.exceptions.YTUfuncUnitError: The NumPy <ufunc 'less'>
> operation is only allowed on objects with identical units. Convert one of
> the arrays to the other's units first. Received units (code_length) and
Has anyone run into this before? yt seems to think these two units aren't
the same -- is it possible the HaloCatalog unit import is being done
incorrectly? At this point, I haven't written anything to the disk, so I'm
not sure what the issue might be.
I am forwarding this conversation to the public list to keep it open for
others to join in and to build up a record that might help others.
Our new student Fabian updated our frontend and implemented particle
readers. Unfortunately some details of the particle "fields" still cause
some bumps that we can not get our head around.
We are now able to read fields and particles, even chunk-wise, and can
work with the data we read in python scripts via the all_data() method.
Nevertheless, using the particle scatter plots such as yt.ParticlePlot
seems not to work. We have some problems with unions / the "all" group
and the fact that we have several particle species (e.g., "hot
electrons", "cold electrons", "helium ions", "nitrogen ions", etc.)
Also since the last rebase: did someone recently change the unit system
in yt? T for "Tesla" seems not to be understood any more. Is there a
changelog somewhere available? Do we have to describe our data in cgs / mks?
Would someone be interested for a quick heads up via e.g., skype /
web-rtc so we can ask specific questions?
Fabian is around on Tuesdays and Fridays, something like early CA time,
and late GER time usually works great (e.g., 9pm-PDT / 6pm-CEST).
Our current branch HEAD can be found here (->yt->frontends->openPMD):
And an example script is attached.
Thanks a lot!