I'm wondering if there is a system to deciding when something belongs in yt.lagos (like the HaloFinders), or in yt.extensions (like the HaloProfiler)? For example, I'm thinking of adding a simple bit of code that will calculate the star formation history (Msol/year, for example) for a given set of stars. Would that go in extensions or in lagos? The best I can tell is that extensions are more secondary, as in they are a post-processor of already refined data, while lagos handles the raw data and refines it down.
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
I'd like to propose we change from "jet" as our default colormap to
I've attached sample images of the same simulation -- I think
bds_highcontrast (designed by our very own Britton, nearly two years
ago now) is better looking and is easier to discern differences in.
I've provided a patch that should make this change here:
It applies to trunk. If this goes through, we'll also make our plots
slightly better distinguished, too, because this is a homegrown
colormap. (We also have the "kamae" colormap, created by Tune Kamae,
that isn't in matplotlib and that I think is criminally underused
Also, maybe it should be aliased as something in addition to
"bds_highcontrast"? Another name?
I'm trying to figure out if we can use the EPD as a basis for
installation easily -- in the past, the issue has always been HDF5.
Because we compile against HDF5, and EPD has its own that's bundled
*inside* the h5py egg, it hasn't ever been clear how to ensure that we
link against the same HDF5 for HDF5LightReader as is linked against in
h5py.. I emailed the h5py group about linkages, but it turns out they
don't encode the linkages anywhere in the python code.
Andrew from h5py suggested that ctypes might have the capabilities to
figure out the linkage information, but I wasn't able to do this.
But, I think there are only a few cases we need to consider -- linux
and OSX, really -- for linking against the EPD. Both of these systems
have very clear mechanisms for identifying the libraries against which
a given shared library is linked. Presupposing that we know that, we
can manipulate the path as necessary to make a *guess* at the HDF5
location. Things are a bit more complicated if you have a non-fat
HDF5 linked against h5py, but that's something we can deal with later.
So, I was wondering: does anybody have an installation of EPD that we
could test these mechanisms on? My machine is ... resistant to these,
because I've got libraries scattered about. If you do have the EPD
installed, could you email me off list, and we'll test it out?
This library was mentioned on the scipy mailing list today:
It looks VERY interesting, as it seems to exist somewhere between the
'time series' and 'data series' modes of thought -- which I would
argue is where simulation data lives, too. There was a talk on it at
PyCon earlier this week:
(the talk: http://pycon.blip.tv/file/3261331 )
I haven't watched it yet, but the slides and documentation are very
interesting. It integrates with MPL, NumPy, etc etc, and provides a
number of running-functions for things like running medians and
whatnot. I'm going to try running some of my analysis for my current
project in this, and if it works out, I think we should consider it as
a basis for some more extensive time-domain calculations.
The soft-links and hard-links are compelling, but I recommend not
upgrading as I believe this again introduces API incompatibilities
---------- Forwarded message ----------
From: Andrew Collette <andrew.collette(a)gmail.com>
Date: Tue, Feb 23, 2010 at 1:45 PM
Subject: HDF5 for Python (h5py) 1.3.0 beta
HDF5 for Python (h5py) 1.3.0 BETA
I'm pleased to announce that HDF5 for Python 1.3 is now available! This
is a significant release introducing a number of new features, including
support for soft/external links as well as object and region references.
I encourage all interested HDF5/NumPy/Python users to give the beta a try
and to do your best to break it. :) Download, documentation and contact
links are below.
What is h5py?
HDF5 for Python (h5py) is a general-purpose Python interface to the
Hierarchical Data Format library, version 5. HDF5 is a mature scientific
software library originally developed at NCSA, designed for the fast,
flexible storage of enormous amounts of data.
>From a Python programmer's perspective, HDF5 provides a robust way to
store data, organized by name in a tree-like fashion. You can create
datasets (arrays on disk) hundreds of gigabytes in size, and perform
random-access I/O on desired sections. Datasets are organized in a
filesystem-like hierarchy using containers called "groups", and
accesed using the tradional POSIX /path/to/resource syntax.
In addition to providing interoperability with existing HDF5 datasets
and platforms, h5py is a convienient way to store and retrieve
arbitrary NumPy data and metadata.
HDF5 datasets and groups are presented as "array-like" and "dictionary-like"
objects in order to make best use of existing experience. For example,
dataset I/O is done with NumPy-style slicing, and group access is via
indexing with string keys. Standard Python exceptions (KeyError, etc) are
raised in response to underlying HDF5 errors.
New features in 1.3
- Full support for soft and external links
- Full support for object and region references, in all contexts (datasets,
attributes, etc). Region references can be created using the standard
NumPy slicing syntax.
- A new get() method for HDF5 groups, which also allows the type of an
object or link to be queried without first opening it.
- Improved locking system which makes h5py faster in both multi-threaded and
- Automatic creation of missing intermediate groups (HDF5 1.8)
- Anonymous group and dataset creation (HDF5 1.8)
- Option to enable cProfile support for the parts of h5py written in Cython
- Many bug fixes and performance enhancements
- Old-style dictionary methods (listobjects, etc) will now issue
DeprecationWarning, and will be removed in 1.4.
- Dataset .value attribute is deprecated. Use dataset[...] or dataset[()].
- new_vlen(), get_vlen(), new_enum() and get_enum() are deprecated in favor
of the functions h5py.special_dtype() and h5py.check_dtype(), which also
support reference types.
Where to get it
* Main website, documentation: http://h5py.alfven.org
* Downloads, bug tracker: http://h5py.googlecode.com
* Mailing list (discussion and development): h5py at googlegroups.com
* Contact email: h5py at alfven.org
* Linux, Mac OS-X or Windows
* Python 2.5 or 2.6
* NumPy 1.0.3 or later
* HDF5 1.6.5 or later (including 1.8); HDF5 is included with
the Windows version.
You received this message because you are subscribed to the Google
Groups "h5py" group.
To post to this group, send email to h5py(a)googlegroups.com.
To unsubscribe from this group, send email to h5py+unsubscribe(a)googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h5py?hl=en.
I've parallelized the volume renderer in hg. It will decompose both
grids (but not bricks! -- the distinction is important for large root
grid tiles) across processors, as well as cutting the image plane into
bits. I was getting artifacts, but now I am no longer getting them.
The scaling hasn't been great so far, but it's been pretty good, and
it should certainly assist with memory issues.
To get it going, I added a new object, VolumeRendering, that hangs off
the hierarchy (like other objects.) You shouldn't need to do anything
special besides "--parallel" to run it in parallel. The old
direct_ray_cast probably doesn't work, as I had to change the
arguments to the VectorPlane object to allow for non-square volume
Here's an example script:
There's some calculation in here of the L vector and whatnot.
Additionally, this uses the new image saving routines Sam put in
yesterday (awesome work, Sam!).
Let me know what you think! I'm going to try testing this out to see
how well it scales for very large tiles as well as very large data.
I have implemented a new contour finding algorithm. This new
algorithm is based on what I can remember from a paper I discussed
with Dave and Britton some time ago, but since I can't find the paper
anymore, it's not necessarily a full implementation. The new strong
points are that while the initial identification *may* be slower
(although actually, I'm *not* sure it is!) you can essentially scroll
over the contours instantly; pulling out level sets at arbitrary
densities is *very* cheap, so doing dendograms and other level set
identification should be basically free -- in fact, pulling out trees
with exact values of density for the splits/joins should be trivial.
(This assumes that there are no duplicate values; in the paper I
recall they suggest jittering by random values of order float-epsilon
to ensure this.)
I've placed the diff for the Cython code here:
and the test script that performs the contouring and the verification here:
(you can also download these with yt_lodgeit.py --download=...)
You'll have to re-cythonize after you apply the patch. (I've already
done this on Triton, and the LCA dev group install should have these
Right now it's still slightly raw. Note that the Cython function for
getting the contours out, extract_identified_contours, accepts and
index rather than a density. This will probably change, and I'd also
like to change it such that it will actually pull out a tree rather
than the indices -- which shouldn't be too hard. The index fed in
here is the index in the *sorted* system, because as you'll note the
densities (or whatever other field there is) must be fed in sorted
such that the joins proceed outward from the lowest-index values.
Let me know what you think. I'm going to conduct some more tests and
wrap it in the familiar contour interface, with additional wrapping
layers for holding the full set of topology rather than the explicitly
pulled-out contours. I believe this algorithm can be parallelized
relatively easily, and that will be a final step after cleaning it up.
Has anyone had this problem with install of mpi4py on linux? This is
on ubuntu. I have tried everything I can think of but it still fails
every time I run it:
from mpi4py import MPI
mpi4py/MPI.so: undefined symbol: lt_dlexit
I reinstalled openmpi, mpi4py several times each with different
library linkages, it always fails like this.
Thanks for any help...
Google Voice: (774) 469-0278
Over the last little while, there has been a conscious motion of the
Python community away from "setuptools" to "distribute." The upshot
of this is that distribute is maintained by a larger body of people
and is much more active. Additionally, many of the bugs of setuptools
have been fixed. ( The homepage is here:
Setuptools provides a couple things: mainly, and most visibly, it's
the source of 'easy_install' and 'ez_setup.py', both of which are
installed by yt and by the yt install script. I've committed a change
to trunk that will, instead, install the distribute packages. This
should be a drop in replacement and re-running the install script
should update everything correctly. easy_install will still be
provided, but from distribute instead of setuptools. Additionally,
I've added 'pip', which is a newer piece of software that does the
same thing 'easy_install' does. "pip install" is more reliable and
less opaque than "easy_install."
Anyway, I wanted to give you all a heads up. If you run into problems
with anything following this change, please let me know!
With distribute, I'm going to be adding an "instinfo" command to the
"yt" script, which will hopefully help with getting information about
the current installation and so on. This should alleviate some of the
problems of figuring out where things are installed, etc etc, and
maybe even add an auto-update command.
Just as a note, moving forward.
---------- Forwarded message ----------
From: Brian Granger <ellisonbg(a)gmail.com>
Date: Wed, Feb 10, 2010 at 11:34 AM
Subject: Re: [mpi4py] Python on 10K of cores on BG/P
To: mpi4py <mpi4py(a)googlegroups.com>
> We have been developing an electronic structure simulation software
> GPAW (https://wiki.fysik.dtu.dk/gpaw/).
> The software is written mostly in Python with the core computational
> routines in C-extensions. For parallel
> calculations we use MPI which is called both from C and Python
> (through our own Python interfaces for the
> MPI calls we need).
> We have run the code successfully on different supercomputing
> architectures such as Cray XT5 and Blue Gene,
> however as we are moving to thousands or tens of thousands processes
> one limitation of the current
> approach has become evident: at start-up time, the imports of python
> modules are starting to take
> increasing amount of time as huge number of processors try to read the
> same .py/.pyc files and the
> filesystem cannot naturally handle this efficiently.
Yes, I can imagine that if the .py files are on a shared filesystem,
things would grind to a halt.
The best way to fix this is to simply install all the .py files on the
local disks of the compute
nodes....assuming the compute nodes have local disks :-).
If they don't have local disks, you are in a really tough situation.
In some cases, it is feasible to think about
saving the state of the python interpreter (along with imported
modules), but in this case, I am doubtful that
will work. If you are importing Python modules that link to
C/C++/Fortran code, this will be very difficult.
Furthermore, if your Python code is calling to MPI, you will also have
to handle to fact that you have a live MPI universe
with open sockets and so on. Separating out the parts that you
can/want to send from the parts you can't/don't
want to send will be quite a mess.
AND, even if you are able to serialize the entire state of the Python
inperpreter, you will still have to scatter
it to all compute nodes (and unserialize it), which is what the shared
filesystem is doing to begin with.
While this scatter all may take place over a faster interconnect, you
don't be able to get rid of it.
Thus, in my mind, using a local disk is the only reasonable way to go.
I realize it is likely that the local disk
solution is not an option for you. In that case, I think you should
go back to Cray and ask for an upgrade ;-)
> Is it possible to modify the Python interpreter in order to have a
> single process do the import and then
> broadcast the data to the rest of the tasks?
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne, IL 60490
> (630) 252-3441 (O)
> (630) 470-0462 (C)
> You received this message because you are subscribed to the Google Groups "mpi4py" group.
> To post to this group, send email to mpi4py(a)googlegroups.com.
> To unsubscribe from this group, send email to mpi4py+unsubscribe(a)googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
You received this message because you are subscribed to the Google
Groups "mpi4py" group.
To post to this group, send email to mpi4py(a)googlegroups.com.
To unsubscribe from this group, send email to
For more options, visit this group at