Mailman 3 February 2010 - yt-dev

yt structure
by Stephen Skory 08 Mar '10

08 Mar '10

Howdy y'all, I'm wondering if there is a system to deciding when something belongs in yt.lagos (like the HaloFinders), or in yt.extensions (like the HaloProfiler)? For example, I'm thinking of adding a simple bit of code that will calculate the star formation history (Msol/year, for example) for a given set of stars. Would that go in extensions or in lagos? The best I can tell is that extensions are more secondary, as in they are a post-processor of already refined data, while lagos handles the raw data and refines it down. Thanks! _______________________________________________________ sskory(a)physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________

3 5

Proposed change of default colormap
by Matthew Turk 02 Mar '10

02 Mar '10

Hi all, I'd like to propose we change from "jet" as our default colormap to "bds_highcontrast". I've attached sample images of the same simulation -- I think bds_highcontrast (designed by our very own Britton, nearly two years ago now) is better looking and is easier to discern differences in. I've provided a patch that should make this change here: http://paste.enzotools.org/show/344/ It applies to trunk. If this goes through, we'll also make our plots slightly better distinguished, too, because this is a homegrown colormap. (We also have the "kamae" colormap, created by Tune Kamae, that isn't in matplotlib and that I think is criminally underused overall.) Also, maybe it should be aliased as something in addition to "bds_highcontrast"? Another name? -Matt

3 4

EPD
by Matthew Turk 28 Feb '10

28 Feb '10

Hi all, I'm trying to figure out if we can use the EPD as a basis for installation easily -- in the past, the issue has always been HDF5. Because we compile against HDF5, and EPD has its own that's bundled *inside* the h5py egg, it hasn't ever been clear how to ensure that we link against the same HDF5 for HDF5LightReader as is linked against in h5py.. I emailed the h5py group about linkages, but it turns out they don't encode the linkages anywhere in the python code. Andrew from h5py suggested that ctypes might have the capabilities to figure out the linkage information, but I wasn't able to do this. But, I think there are only a few cases we need to consider -- linux and OSX, really -- for linking against the EPD. Both of these systems have very clear mechanisms for identifying the libraries against which a given shared library is linked. Presupposing that we know that, we can manipulate the path as necessary to make a *guess* at the HDF5 location. Things are a bit more complicated if you have a non-fat HDF5 linked against h5py, but that's something we can deal with later. So, I was wondering: does anybody have an installation of EPD that we could test these mechanisms on? My machine is ... resistant to these, because I've got libraries scattered about. If you do have the EPD installed, could you email me off list, and we'll test it out? Thanks! -Matt

1 0

pandas
by Matthew Turk 25 Feb '10

25 Feb '10

Hi guys, This library was mentioned on the scipy mailing list today: http://code.google.com/p/pandas/ It looks VERY interesting, as it seems to exist somewhere between the 'time series' and 'data series' modes of thought -- which I would argue is where simulation data lives, too. There was a talk on it at PyCon earlier this week: http://us.pycon.org/2010/conference/schedule/event/50/ (the talk: http://pycon.blip.tv/file/3261331 ) I haven't watched it yet, but the slides and documentation are very interesting. It integrates with MPL, NumPy, etc etc, and provides a number of running-functions for things like running medians and whatnot. I'm going to try running some of my analysis for my current project in this, and if it works out, I think we should consider it as a basis for some more extensive time-domain calculations. -Matt

1 0

Fwd: HDF5 for Python (h5py) 1.3.0 beta
by Matthew Turk 23 Feb '10

23 Feb '10

The soft-links and hard-links are compelling, but I recommend not upgrading as I believe this again introduces API incompatibilities with yt. -Matt ---------- Forwarded message ---------- From: Andrew Collette <andrew.collette(a)gmail.com> Date: Tue, Feb 23, 2010 at 1:45 PM Subject: HDF5 for Python (h5py) 1.3.0 beta To: h5py(a)googlegroups.com HDF5 for Python (h5py) 1.3.0 BETA ================================= I'm pleased to announce that HDF5 for Python 1.3 is now available! This is a significant release introducing a number of new features, including support for soft/external links as well as object and region references. I encourage all interested HDF5/NumPy/Python users to give the beta a try and to do your best to break it. :) Download, documentation and contact links are below. What is h5py? ------------- HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a mature scientific software library originally developed at NCSA, designed for the fast, flexible storage of enormous amounts of data. >From a Python programmer's perspective, HDF5 provides a robust way to store data, organized by name in a tree-like fashion. You can create datasets (arrays on disk) hundreds of gigabytes in size, and perform random-access I/O on desired sections. Datasets are organized in a filesystem-like hierarchy using containers called "groups", and accesed using the tradional POSIX /path/to/resource syntax. In addition to providing interoperability with existing HDF5 datasets and platforms, h5py is a convienient way to store and retrieve arbitrary NumPy data and metadata. HDF5 datasets and groups are presented as "array-like" and "dictionary-like" objects in order to make best use of existing experience. For example, dataset I/O is done with NumPy-style slicing, and group access is via indexing with string keys. Standard Python exceptions (KeyError, etc) are raised in response to underlying HDF5 errors. New features in 1.3 ------------------- - Full support for soft and external links - Full support for object and region references, in all contexts (datasets, attributes, etc). Region references can be created using the standard NumPy slicing syntax. - A new get() method for HDF5 groups, which also allows the type of an object or link to be queried without first opening it. - Improved locking system which makes h5py faster in both multi-threaded and single-threaded applications. - Automatic creation of missing intermediate groups (HDF5 1.8) - Anonymous group and dataset creation (HDF5 1.8) - Option to enable cProfile support for the parts of h5py written in Cython - Many bug fixes and performance enhancements Other changes ------------- - Old-style dictionary methods (listobjects, etc) will now issue DeprecationWarning, and will be removed in 1.4. - Dataset .value attribute is deprecated. Use dataset[...] or dataset[()]. - new_vlen(), get_vlen(), new_enum() and get_enum() are deprecated in favor of the functions h5py.special_dtype() and h5py.check_dtype(), which also support reference types. Where to get it --------------- * Main website, documentation: http://h5py.alfven.org * Downloads, bug tracker: http://h5py.googlecode.com * Mailing list (discussion and development): h5py at googlegroups.com * Contact email: h5py at alfven.org Requires -------- * Linux, Mac OS-X or Windows * Python 2.5 or 2.6 * NumPy 1.0.3 or later * HDF5 1.6.5 or later (including 1.8); HDF5 is included with the Windows version. -- You received this message because you are subscribed to the Google Groups "h5py" group. To post to this group, send email to h5py(a)googlegroups.com. To unsubscribe from this group, send email to h5py+unsubscribe(a)googlegroups.com. For more options, visit this group at http://groups.google.com/group/h5py?hl=en.

1 0

Parallel Volume Renderer
by Matthew Turk 22 Feb '10

22 Feb '10

Hi guys, I've parallelized the volume renderer in hg. It will decompose both grids (but not bricks! -- the distinction is important for large root grid tiles) across processors, as well as cutting the image plane into bits. I was getting artifacts, but now I am no longer getting them. The scaling hasn't been great so far, but it's been pretty good, and it should certainly assist with memory issues. To get it going, I added a new object, VolumeRendering, that hangs off the hierarchy (like other objects.) You shouldn't need to do anything special besides "--parallel" to run it in parallel. The old direct_ray_cast probably doesn't work, as I had to change the arguments to the VectorPlane object to allow for non-square volume renderings. Here's an example script: http://paste.enzotools.org/show/335/ There's some calculation in here of the L vector and whatnot. Additionally, this uses the new image saving routines Sam put in yesterday (awesome work, Sam!). Let me know what you think! I'm going to try testing this out to see how well it scales for very large tiles as well as very large data. -Matt

3 3

Contour finding
by Matthew Turk 16 Feb '10

16 Feb '10

Hi guys, I have implemented a new contour finding algorithm. This new algorithm is based on what I can remember from a paper I discussed with Dave and Britton some time ago, but since I can't find the paper anymore, it's not necessarily a full implementation. The new strong points are that while the initial identification *may* be slower (although actually, I'm *not* sure it is!) you can essentially scroll over the contours instantly; pulling out level sets at arbitrary densities is *very* cheap, so doing dendograms and other level set identification should be basically free -- in fact, pulling out trees with exact values of density for the splits/joins should be trivial. (This assumes that there are no duplicate values; in the paper I recall they suggest jittering by random values of order float-epsilon to ensure this.) I've placed the diff for the Cython code here: http://paste.enzotools.org/show/327 and the test script that performs the contouring and the verification here: http://paste.enzotools.org/show/326 (you can also download these with yt_lodgeit.py --download=...) You'll have to re-cythonize after you apply the patch. (I've already done this on Triton, and the LCA dev group install should have these functions.) Right now it's still slightly raw. Note that the Cython function for getting the contours out, extract_identified_contours, accepts and index rather than a density. This will probably change, and I'd also like to change it such that it will actually pull out a tree rather than the indices -- which shouldn't be too hard. The index fed in here is the index in the *sorted* system, because as you'll note the densities (or whatever other field there is) must be fed in sorted such that the joins proceed outward from the lowest-index values. Let me know what you think. I'm going to conduct some more tests and wrap it in the familiar contour interface, with additional wrapping layers for holding the full set of topology rather than the explicitly pulled-out contours. I believe this algorithm can be parallelized relatively easily, and that will be a final step after cleaning it up. Matt

1 0

mpi4py problem
by Eric Hallman 16 Feb '10

16 Feb '10

Has anyone had this problem with install of mpi4py on linux? This is on ubuntu. I have tried everything I can think of but it still fails every time I run it: from mpi4py import MPI ImportError: /home/hallman/yt/yt-unknown/lib/python2.6/site-packages/ mpi4py/MPI.so: undefined symbol: lt_dlexit I reinstalled openmpi, mpi4py several times each with different library linkages, it always fails like this. Thanks for any help... Eric Hallman Google Voice: (774) 469-0278 hallman13(a)gmail.com

2 2

distribute and setuptools
by Matthew Turk 11 Feb '10

11 Feb '10

Hi guys, Over the last little while, there has been a conscious motion of the Python community away from "setuptools" to "distribute." The upshot of this is that distribute is maintained by a larger body of people and is much more active. Additionally, many of the bugs of setuptools have been fixed. ( The homepage is here: http://packages.python.org/distribute/index.html ) Setuptools provides a couple things: mainly, and most visibly, it's the source of 'easy_install' and 'ez_setup.py', both of which are installed by yt and by the yt install script. I've committed a change to trunk that will, instead, install the distribute packages. This should be a drop in replacement and re-running the install script should update everything correctly. easy_install will still be provided, but from distribute instead of setuptools. Additionally, I've added 'pip', which is a newer piece of software that does the same thing 'easy_install' does. "pip install" is more reliable and less opaque than "easy_install." Anyway, I wanted to give you all a heads up. If you run into problems with anything following this change, please let me know! With distribute, I'm going to be adding an "instinfo" command to the "yt" script, which will hopefully help with getting information about the current installation and so on. This should alleviate some of the problems of figuring out where things are installed, etc etc, and maybe even add an auto-update command. -Matt

2 1

Fwd: [mpi4py] Python on 10K of cores on BG/P
by Matthew Turk 10 Feb '10

10 Feb '10

Just as a note, moving forward. ---------- Forwarded message ---------- From: Brian Granger <ellisonbg(a)gmail.com> Date: Wed, Feb 10, 2010 at 11:34 AM Subject: Re: [mpi4py] Python on 10K of cores on BG/P To: mpi4py <mpi4py(a)googlegroups.com> > We have been developing an electronic structure simulation software > GPAW (https://wiki.fysik.dtu.dk/gpaw/). > The software is written mostly in Python with the core computational > routines in C-extensions. For parallel > calculations we use MPI which is called both from C and Python > (through our own Python interfaces for the > MPI calls we need). Nice! > We have run the code successfully on different supercomputing > architectures such as Cray XT5 and Blue Gene, > however as we are moving to thousands or tens of thousands processes > one limitation of the current > approach has become evident: at start-up time, the imports of python > modules are starting to take > increasing amount of time as huge number of processors try to read the > same .py/.pyc files and the > filesystem cannot naturally handle this efficiently. Yes, I can imagine that if the .py files are on a shared filesystem, things would grind to a halt. The best way to fix this is to simply install all the .py files on the local disks of the compute nodes....assuming the compute nodes have local disks :-). If they don't have local disks, you are in a really tough situation. In some cases, it is feasible to think about saving the state of the python interpreter (along with imported modules), but in this case, I am doubtful that will work. If you are importing Python modules that link to C/C++/Fortran code, this will be very difficult. Furthermore, if your Python code is calling to MPI, you will also have to handle to fact that you have a live MPI universe with open sockets and so on. Separating out the parts that you can/want to send from the parts you can't/don't want to send will be quite a mess. AND, even if you are able to serialize the entire state of the Python inperpreter, you will still have to scatter it to all compute nodes (and unserialize it), which is what the shared filesystem is doing to begin with. While this scatter all may take place over a faster interconnect, you don't be able to get rid of it. Thus, in my mind, using a local disk is the only reasonable way to go. I realize it is likely that the local disk solution is not an option for you. In that case, I think you should go back to Cray and ask for an upgrade ;-) Cheers, Brian > Is it possible to modify the Python interpreter in order to have a > single process do the import and then > broadcast the data to the rest of the tasks? > -- > Nichols A. Romero, Ph.D. > Argonne Leadership Computing Facility > Argonne, IL 60490 > (630) 252-3441 (O) > (630) 470-0462 (C) > > -- > You received this message because you are subscribed to the Google Groups "mpi4py" group. > To post to this group, send email to mpi4py(a)googlegroups.com. > To unsubscribe from this group, send email to mpi4py+unsubscribe(a)googlegroups.com. > For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en. > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger(a)calpoly.edu ellisonbg(a)gmail.com -- You received this message because you are subscribed to the Google Groups "mpi4py" group. To post to this group, send email to mpi4py(a)googlegroups.com. To unsubscribe from this group, send email to mpi4py+unsubscribe(a)googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.

1 0