Very cool: includes natural neighbor interpolation in N dimensions!
---------- Forwarded message ----------
From: Pauli Virtanen <pav(a)iki.fi>
Date: Tue, Aug 31, 2010 at 3:01 PM
Subject: [SciPy-Dev] RFR: N-dimensional interpolation
Sun, 25 Jul 2010 15:35:11 +0000, Pauli Virtanen wrote:
> I took the Qhull by the horns, and wrote a straightforward `griddata`
> implementation for working in N-D:
It's now committed to SVN (as it works-well-for-me).
Post-mortem review and testing is welcome.
What's in there is:
Delaunay decomposition and some associated low-level N-d geometry
1) Linear barycentric interpolation
2) Cubic spline interpolation (2D-only, C1 continuous,
Convenience interface to the N-d interpolation classes.
What could be added:
- More comprehensive interface to other features of Qhull
- Using qhull_restore, qhull_save to store Qhull contexts
instead of copying the relevant data?
- Optimizing the cubic interpolant
- Monotonic cubic interpolation
- Cubic interpolation in 3-d
- Natural neighbour interpolation
import numpy as np
def func(x, y):
return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]
points = np.random.rand(1000, 2)
values = func(points[:,0], points[:,1])
from scipy.interpolate import griddata
grid_z0 = griddata(points, values, (grid_x, grid_y), method='nearest')
grid_z1 = griddata(points, values, (grid_x, grid_y), method='linear')
grid_z2 = griddata(points, values, (grid_x, grid_y), method='cubic')
import matplotlib.pyplot as plt
plt.imshow(func(grid_x, grid_y).T, extent=(0,1,0,1), origin='lower')
plt.plot(points[:,0], points[:,1], 'k.', ms=1)
plt.imshow(grid_z0.T, extent=(0,1,0,1), origin='lower')
plt.imshow(grid_z1.T, extent=(0,1,0,1), origin='lower')
plt.imshow(grid_z2.T, extent=(0,1,0,1), origin='lower')
SciPy-Dev mailing list
First, thanks to Jeff for summarizing all the API changes. I'm really
excited about this -- I think it marks an important transition in the
As a followup to Jeff's email, I wanted to share a couple more
changes. Because we're increasingly supporting non-Enzo codes, an
effort has been made with this API re-shuffling to remove some
"Enzo-isms." This will likely affect many of you.
Previously, the parameter file was meant to be accessed via a
dictionary like method:
for instance. This is still going to be the case, but only for
specific parameters in a given code. Fiducial values for simulation
parameters will now be stored in attributes that hang off the
StaticOutput instance. Specifically, the following changes have been
RefineBy => refine_by
TopGridRank => dimensionality
TopGridDimensions => domain_dimensions
InitialTime => current_time
DomainLeftEdge => domain_left_edge
DomainRightEdge => domain_right_edge
CurrentTimeIdentifier => unique_identifier
CosmologyCurrentRedshift => current_redshift
ComovingCoordinates => cosmological_simulation
CosmologyOmegaMatterNow => omega_matter
CosmologyOmegaLambdaNow => omega_lambda
CosmologyHubbleConstantNow => hubble_constant
So in all of your internal code, instead of accessing:
you would access:
Or, for example, instead of the Enzo-ism pf["InitialTime"] you would
access pf.current_time .
As one last quick comment, I've tried to make these changes throughout
the code base. As testing continues, please feel free to change and
update these. And, if you don't want to just yet, feel free to hg up
to the branch named "stable" which is pre-reorganization. These
parameters are designed to be general across codes, and with time they
will mirror more closely those in the proposed GridDataFormat:
Thanks very much,
After a few days of hard work, Britton, Matt, and I are pleased to
announce that a major reorganization of yt is complete. This is the
biggest part of what will become yt-2.0, and encompasses nearly all of
the codebase. We have pushed it to the tip of the yt branch in the
hg repository today. PLEASE NOTE you MUST reclone (not hg pull!) your
mercurial repository from hg.enzotools.org! Just move your old
repository out of the way and do
hg clone http://hg.enzotools.org/yt
Finally, please note that as of today SVN is read only. You can still
use the SVN repository for now, but it will no longer acquire any new
We have done a good amount of testing, but there are sure to be places
where things break. Please report anything you find to the issue
(http://yt.enzotools.org/newticket) as soon as you can.
The changes have almost all been organizational, except for the
addition of a new QuadTree method for projections, which Matt pushed
to the yt branch ahead of this. Furthermore, unless you do something
from yt.mods import *
*everything* should be exactly the same for all of your scripts. We
would really appreciate it if you could file a ticket
(http://yt.enzotools.org/newticket) if you find anything that imports
only from yt.mods doesn't work.
The biggest change is that the old module Snow Crash-derived module
names (lagos, raven, fido, enki) have been removed, and the code has
been divided up into new set of modules, with descriptive names:
drwxr-xr-x 21 joishi joishi 714 Aug 27 11:49 analysis_modules
drwxr-xr-x 31 joishi joishi 1.0k Aug 28 08:45 data_objects
drwxr-xr-x 15 joishi joishi 510 Aug 26 16:57 frontends
drwxr-xr-x 9 joishi joishi 306 Aug 28 08:45 gui
drwxr-xr-x 46 joishi joishi 1.5k Aug 28 08:45 utilities
drwxr-xr-x 38 joishi joishi 1.3k Aug 28 16:09 visualization
The second biggest change, and perhaps the more important one
(especially to yt developers) is the redesign of how we access
component pieces within and outside of yt. Previously, within yt,
importing modules from one part of the code to another was
accomplished by things like
from lagos import *
in the __init__.py file of another module. This has two negative
effects: first, it pollutes the namespaces with many unnecessary
classes and functions (greatly raising the possibility of a
collision), and second, it creates circular imports which can slow
down import times for yt. In order to solve these two problems, we
emptied out __init__.py of everything except a docstring in every
module and replaced them with api.py files. Each module has a api.py
file containing a list of imports of each class or function wihthin
that module imported individually to the API. For example, here is the
from color_maps import \
from plot_collection import \
from fixed_resolution import \
from image_writer import \
from plot_modifications import \
In order to use a function from the yt/visualization package, you
would add the following import lines to your script:
from yt.mods import * # this hasn't changed
from yt.visualization.api import PlotCollection
pf = load("AwesomeTown0132")
pc = PlotCollection(pf)
The third major change was to move extensions to analysis_modules.
This will likely affect many people in a purely cosmetic way; for
example, the halo finder is now imported by adding the following line
to your script:
from yt.analysis_modules.halo_finding.api import HaloFinder
a bit wordier, but much cleaner. If you are developing yt, please note
that the api.py file is for EXTERNAL scripts only. Anything INTERNAL
to yt must directly import any class or function necessary by directly
referencing the file it resides in, e.g.: from .fixed_resolution
Finally, Volume Rendering has been relocated to the new visualization
module. It should otherwise be unchanged (except the api.py change, of
Full notes on all these changes can be found on the wiki:
We hope this will lead to a much cleaner, more hackable, more
understandable yt. yt-2.0 will also include a new, simpler plotting
interface. Please feel free to contact Britton, Matt, or me if you
have any questions or any problems with the new yt.
I added the quad tree projection to the tip of the hg repository.
If you could please put it through a few paces, that'd be great. You
can access it as the attribute "quad_proj" on a hierarchy. I've done
some speed testing and the generality necessary to get it to work with
sources and whatnot makes it a bit slower than if it was just fed the
data, but it's still about twice as fast on my datasets as the old
It doesn't work in parallel yet, but give it a shot on some big
datasets anyway -- I tested it on a 1024^3 L7 and it worked fine in
serial. But, I know how to parallelize it, and the parallelization
process should also speed up the serial projections. (It will allow
us to better choose the batching of IO.) I think this projection
method should also scale *much* better for parallelism, because we
shouldn't have to do domain decomposition to parallelize, which was
always a killer on nested grid simulations.
I'd like to push forward with this, so if you get a chance to try it,
report back with any issues. I'm sure there will be several...
I've placed a comparison test here:
(But note that disk cache will favor the old projection method the
first time you run, so run at least twice. :)
The hg repository has to undergo some major changes (Jeff is composing
an email that will detail the motivations for this) and to that end
we'll need to wipe it and restart it. In short, to connect the *old*
SVN history to the *current* hg history, I had to run "hg convert"
which shuffles all the changeset hashes -- recall that an SHA hash is
used to identify every changeset in hg, and it's created by looking at
the diff, the parents, etc etc, so splicing in the old history has
I can handle the conversion process with as many changesets as
necesary, but because you will have to re-clone *and* all the
changeset hashes will change, I need to make sure I convert them all
to the new repository.
To that end, please push all unpushed changesets *immediately*. I'm
going to close it to push on Monday morning. Anything else will have
to be transplanted (which is really not a huge deal, keep in mind ;-)
but as of Monday you will have to re-clone.
Sorry for the inconvenience ... I tried really, really hard to avoid
this, but because hg has indelible history it's unavoidable. However,
I am able to store the old changeset IDs in the new repository, and
I'm exploring mechanisms for setting them up as "tags" so that they
will still be accessible and point to the new changeset hashes.
Thanks very much,
For what it's worth, I've been using some of these new features (the
C++ templates) in ramses_reader.pyx, so you will need Cython 0.13 to
re-generate its C code.
The type inference capability is VERY cool.
---------- Forwarded message ----------
From: Craig Citro <craigcitro(a)gmail.com>
Date: Wed, Aug 25, 2010 at 2:00 AM
Subject: [cython-users] ANN: Cython 0.13 released!
To: cython-users(a)googlegroups.com, cython-dev(a)codespeak.net,
It is with *great* pleasure that I email to announce the release of
Cython version 0.13! This release sets another milestone on the path
towards Python compatibility and brings major new features and
improvements for the usability of the Cython language.
Download it here: http://cython.org/release/Cython-0.13.tar.gz
== New Features ==
* Closures are fully supported for Python functions. Cython supports
inner functions and lambda expressions. Generators and generator
expressions are __not__ supported in this release.
* Proper C++ support. Cython knows about C++ classes, templates and
overloaded function signatures, so that Cython code can interact with
them in a straight forward way.
* Type inference is enabled by default for safe C types (e.g. double,
bint, C++ classes) and known extension types. This reduces the need
for explicit type declarations and can improve the performance of
untyped code in some cases. There is also a verbose compile mode for
testing the impact on user code.
* Cython's for-in-loop can iterate over C arrays and sliced pointers.
The type of the loop variable will be inferred automatically in this
* The Py_UNICODE integer type for Unicode code points is fully
supported, including for-loops and 'in' tests on unicode strings. It
coerces from and to single character unicode strings. Note that
untyped for-loop variables will automatically be inferred as
Py_UNICODE when iterating over a unicode string. In most cases, this
will be much more efficient than yielding sliced string objects, but
can also have a negative performance impact when the variable is used
in a Python context multiple times, so that it needs to coerce to a
unicode string object more than once. If this happens, typing the loop
variable as unicode or object will help.
* The built-in functions any(), all(), sum(), list(), set() and
dict() are inlined as plain `for` loops when called on generator
expressions. Note that generator expressions are not generally
supported apart from this feature. Also, tuple(genexpr) is not
currently supported - use tuple([listcomp]) instead.
* More shipped standard library declarations. The python_* and
stdlib/stdio .pxd files have been deprecated in favor of clib.* and
cpython[.*] and may get removed in a future release.
== Python compatibility ==
* Pure Python mode no longer disallows non-Python keywords like
'cdef', 'include' or 'cimport'. It also no longer recognises syntax
extensions like the for-from loop.
* Parsing has improved for Python 3 syntax in Python code, although
not all features are correctly supported. The missing Python 3
features are being worked on for the next release.
* from __future__ import print_function is supported in Python 2.6
and later. Note that there is currently no emulation for earlier
Python versions, so code that uses print() with this future import
will require at least Python 2.6.
* New compiler directive language_level (valid values: 2 or 3) with
corresponding command line options -2 and -3 requests source code
compatibility with Python 2.x or Python 3.x respectively. Language
level 3 currently enforces unicode literals for unprefixed string
literals, enables the print function (requires Python 2.6 or later)
and keeps loop variables in list comprehensions from leaking.
* Loop variables in set/dict comprehensions no longer leak into the
surrounding scope (following Python 2.7). List comprehensions are
unchanged in language level 2.
== Incompatible changes ==
* The availability of type inference by default means that Cython
will also infer the type of pointers on assignments. Previously, code
cdef char* s = ...
untyped_variable = s
would convert the char* to a Python bytes string and assign that.
This is no longer the case and no coercion will happen in the example
above. The correct way of doing this is through an explicit cast or by
typing the target variable, i.e.
cdef char* s = ...
untyped_variable1 = <bytes>s
untyped_variable2 = <object>s
cdef object py_object = s
cdef bytes bytes_string = s
* bool is no longer a valid type name by default. The problem is that
it's not clear whether bool should refer to the Python type or the C++
type, and expecting one and finding the other has already led to
several hard-to-find bugs. Both types are available for importing: you
can use from cpython cimport bool for the Python bool type, and from
libcpp cimport bool for the C++ type.
== Contributors ==
Many people contributed to this release, including:
* David Barnett
* Stefan Behnel
* Chuck Blake
* Robert Bradshaw
* Craig Citro
* Bryan Cole
* Lisandro Dalcin
* Eric Firing
* Danilo Freitas
* Christoph Gohlke
* Dag Sverre Seljebotn
* Kurt Smith
* Erik Tollerud
* Carl Witty
I wanted to calculate volume filling fractions and mass fractions of gas that met some criteria. Specifically, the fractions that have a certain metallicity. The _Action routine looked promising, but I couldn't figure out how to use it. I remember there was some talk about filters on the list, but I searched and found that it concerned the halo finder.
I might have reinvented the wheel, but this is how I did it.
It's basically _TotalQuantity with a filter. Also I wanted it to accept multiple limits to avoid re-reading the data for every filter.
Was there an easier way to do this?
I'm going to top post, which I guess I do more than I ought to anyway,
because I'm going to try to address a number of issues that have been
brought up. I've spent some of the day thinking about this issue, and
what it says about yt as a community and about my level of involvement
in various areas.
So, I'll touch on those at the end, but first I'll hit back on the
issue of parallelism and how to address it.
= Parallelism =
I think what is becoming clear is that the step from serial to
than it currently is. As it stands, the section in the manual that
covers parallelism basically says, "These things work, go ahead and
give it a go!" This is my fault, and it's not really sufficient.
More detail has to be given, and rather than a whitelist of actions
that are parallel safe we need to also include a *blacklist*.
The second step we need to take is provide examples of how to submit a
parallel job -- how much it requires in terms of resources and so on.
Unfortunately, it's not entirely clear to me the best way to organize
the documentation, and I don't even really know where this would go.
Stephen did a really rad job of doing this in the halo finding paper,
and he's done an excellent job with his work on the halo finder as a
whole. (It's just that last 5% toward the user experience, I think.
:) My own work on the parallel projections should be better
documented and the UX there should be improved as well.
The third is to keep an eye on memory usage. Memory profiling is
difficult, but it's something we have tried before and that I believe
needs to be re-examined. Specifically, it seems that both projections
and the parallel halo finder suffer from this problem. As a note,
next week I will be spending some time swapping out the old projection
method for the new quad-tree method. This should improve both speed
and memory usage.
Okay, on to the larger problems that I think this relates to.
= Bugs =
First off, we need a mechanism for handling and bugs. I don't want to
use the word "triage" here, but it is becoming clear that we need a
mechanism. Currently, we have a Trac site that really doesn't get
used at all. I've explored a couple mechanisms for encouraging bug
* I can enable OpenID login -- this means using something like your
GoogleName to log in and report a bug.
* I've already replicated the .htpasswd between mercurial and the
Trac site, so anyone who has a report there can log in to the Trac
* yt could register a default excepthook that encourages the user to
report a bug. I'm leery of this because I'm not sure I want to muck
about with Python internals that much, but it could be done nicely, I
Overall, though, what really needs to happen is some kind of *buy-in*
on the part of the user -- which in this case is anyone who has had
trouble with yt. I have pulled back from yt-users, and I'm really
happy that everyone else has stepped up. But I'm worried that as time
goes on, people will pick up knowledge in ways that aren't indexable
by search engines and then this knowledge keeps getting re-learned.
Public reporting of bugs, particularly as it could relate to
improvements in documentation, is essential. But this can't happen if
it's just driven by one or two people. And if no one else is
motivated to encourage this, then perhaps that's just where we'll
stay. I can't force buy-in, I can only encourage people to see the
benefits to reporting bugs, sharing experiences, and all of that. We
need to have people to read and handle bugs, and then people to whom
they apply. I really would like for this not always to be me.
Anyway, if you have an hg account, you can login:
and then report a bug:
It helps if you paste the traceback with --paste on the command line
of your script.
= Fixing Bugs =
We've had great success with people taking ownership of different bugs
on the mailing list and fixing them. This is a huge success story,
and I thank all the developers that have made this happen. But I
think it's important that we continue to develop this sense of
ownership through the Trac site.
= Major Enhancements =
Adding on major enhancements is unfortunately an open problem. For
instance, I would really like to see the parallelism framework
essentially rewritten to be more modular and to take advantage of
nested MPI communicators. I have a sketch of how this would go, and
I've even written some code. But, I'm not employed to work on yt. I
mostly develop it either as it suits my research interests (and I am
operating under the working assumption that this is true for everyone
else) or as I find it something fun to do in the evenings. I want it
to be used, and to be useful, and I believe that my stewardship of the
project up to this point supports this conclusion.
I *truly* do believe in cross-code simulation analysis, sharing
facilities with other users, and reproducible research. But I am
reaching the limits of what I, alone, can do. So far we've had some
pretty major contributions from a number of developers, but I think
it's important that we communicate to the community that this is still
a volunteer project.
We don't have a team of dedicated software developers, we have a
handful of scientists who are working to both further their own
research interests while providing the best user experience possible
for an advanced analysis code. And, to be perfectly frank, I think
we're doing a pretty darn good job on both of those fronts. Many
people now use yt on a daily basis to analyze simulation outputs from
several different codes. We've got advanced analysis and viz
functionality, thanks to *you* developers, that has been published a
dozen papers, been shown at the Adler Planetarium, taken home the
third place at the SciDAC visualization "Oscars," and even (ever so
briefly) been on the Discovery channel.
But, still, we have to keep our eye on the prize. And if the prize
the other developers have *their* eye on isn't the prize *you* have
your eye on, unfortunately some responsibility will fall back on to
your shoulders. I honestly wish I could spend more time helping
others use yt, developing yt, and building it to be the tool I really
wish it would be. Don't think that I don't see all the warts and
problems that you all see -- I do. In the docs, the source code, the
functionality, the user experience ... I see the warts too.
But even though developing yt is fun, I'm still developing it because
I'm a scientist who wants to ask questions of his data.
= Building Community =
We've done a good job of this, but it's becoming clear that there's a disjoint:
* We're doing a mediocre job of shepherding users into being
contributing developers. I'd like to help fix this by writing up more
suggestions on how to develop and share your changes. yt will
stagnate if we don't continue to churn the developer list.
* We need to articulate the vision for yt, and I'm not sure my vision
is the one anyone else has.
I'd love to hear suggestions about this aspect.
= Documentation =
Any help anyone can give with documentation would be great.
Organization, notes, suggestions, anything. Report it as a bug.
Commit changes. Email the list.
Anyway, that's basically what I've been thinking about, and what I
wanted to say. I think we have an opportunity with yt to build a real
community of collaboration and sharing of resources. And we've done a
great job with that so far. But it still has to be something of a
jumpstart approach -- jumpstarting development and then encouraging
others to pick up the torch and run with it. Grass roots,
science-driven development is kind of the name of the game here.
And when there *are* problems, I'm sure that lots of people are eager
to jump at helping you fix them. But we have to hear about 'em before
we can. :)
On Thu, Aug 19, 2010 at 1:12 PM, Stephen Skory <stephenskory(a)yahoo.com> wrote:
> Hi Brian & Eric,
>>As you know (since we discussed it off-list), I'm the reason for this being
>>mentioned to you. I had some pretty horrible problems with the various
>>incarnations of HOP in yt being excruciatingly slow and consuming huge amounts
>>of memory for a 1024^3 unigrid dataset, to the point where my grad student and I
>>ended up just using P-GroupFinder, the standalone halo finder that comes with
>>week-of-code enzo. Note that when I say "excruciatingly slow" and "consuming
>>huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2
>>cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out
>>of memory, or, alternately, didn't finish in 24 hours.
> A few notes in response:
> - Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours which
> averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8 the
> size and should have run, and I don't know why it didn't.
> - If I wasn't trying to graduate I would have had more time to assist when your
> student (Brian) asked me for help. I'm sorry so much of your time was wasted.
> - My tool as a public tool is not any good unless other people can use it too.
> Clearly I need to do some work on that.
> - It *does* use much more memory than it needs to, you are right. I know where
> the problems are, and whoo-boy they are there, but they are not easy to fix.
> - Speed could be better, but some of this has to do with how HOP itself works.
> For example, it needs to run the kD tree twice, unlike FOF which needs to only
> once. The final group building step is a "global" operation, so that's slow as
> well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF about 25.
> The C HOP and FOF in yt both use the same kD tree, same data I/O methods, so
> that's a fair ratio of the increased workload.
> sskory(a)physics.ucsd.edu o__ Stephen Skory
> http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
> Yt-dev mailing list
on my last push, I got this error. Should I be concerned? The push seems to have
[sskory@login-4-41 yt-devel]$ hg push
pushing to http://hg.enzotools.org/yt
searching for changes
http authorization required
realm: hg Repositories
adding file changes
added 1 changesets with 2 changes to 2 files
notify: sending 1 subscribers 1 changes
error: changegroup.notify hook raised an exception: (535, 'Error: authentication
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
As I mentioned in my emails to Jeff and Britton, I've updated the Trac
site somewhat. I now encourage developers to log in here:
with their hg u/p, and then to submit tickets and read over existing
tickets. Tickets should be submitted for individual tasks as well as
overarching goals. All modifications to tickets will get sent to
yt-svn, which I think will be an important method of keeping up with
Each ticket should be assigned a target and a component, and if it's a
bug it should include a traceback.
**We should focus on tickets as a workflow system and a capture system
to ensure bugs don't get forgotten.**
Additionally, I'm going to encourage people to start using the Wiki as
either a staging ground for new, gestating documentation as well as
collaborative lists and documents. I'd moved away from the wiki
myself for a while but I'm coming back to it for more ephemeral or
high-turnover pieces of text ,and I'd like to encourage others to do
the same. Here are a few pages recently modified by me:
Additionally, the installation guide, any ideas for the future, etc
etc should be played with on the Wiki.