[Numpy-discussion] huge array calculation speed

Thu Jul 10 17:30:36 EDT 2008

On Thu, Jul 10, 2008 at 2:55 PM, Anne Archibald <peridot.faceted at gmail.com>
wrote:

> 2008/7/10 Dan Lussier <dtlussier at gmail.com>:
>
> > I am relatively new to numpy and am having trouble with the speed of
> > a specific array based calculation that I'm trying to do.
> >
> > What I'm trying to do is to calculate the total total potential
> > energy and coordination number of each atom within a relatively large
> > simulation.  Each atom is at a position (x,y,z) given by a row in a
> > large array (approximately 1e6 by 3) and presently I have no
> > information about its nearest neighbours so each its position must be
> > checked against all others before cutting the list down prior to
> > calculating the energy.
>
> The way you've implemented this goes as the square of the number of
> atoms. This is going to absolutely kill performance, and you can spend
> weeks trying to optimize the code for a factor of two speedup. I would
> look hard for algorithms that do this in less than O(n^2).
>
> This problem of finding the neighbours of an object has seen
> substantial research, and there are a variety of well-established
> techniques covering many possible situations. My knowledge of it is
> far from up-to-date, but since you have only a three-dimensional
> problem, you could try a three-dimensional grid (if your atoms aren't
> too clumpy) or octrees (if they are somewhat clumpy); kd-trees are
> probably overkill (since they're designed for higher-dimensional
> problems).
>
> > My current numpy code is below and in it I have tried to push as much
> > of the work for this computation into compiled C (ideally) of numpy.
> > However it is still taking a very long time at approx 1 second per
> > row.  At this speed even some simple speed ups like weave  won't
> > really help.
> >
> > Are there any other numpy ways that it would be possible to calculate
> > this, or should I start looking at going to C/C++?
> >
> > I am also left wondering if there is something wrong with my
> > installation of numpy (i.e. not making use of lapack/ATLAS).  Other
> > than that if it is relevant - I am running 32 bit x86 Centos 5.1
> > linux on a dual Xeon 3.0 GHz Dell tower with 4 GB of memory.
>
> Unfortunately, implementing most of the algorithms in the literature
> within numpy is going to be fairly cumbersome. But I think there are
> better algorithms you could try:
>
> * Put all your atoms in a grid. Ideally the cells would be about the
> size of your cutoff radius, so that for each atom you need to pull out
> all atoms in at most eight cells for checking against the cutoff. I
> think this can be done fairly efficiently in numpy.
>
> * Sort the atoms along a coordinate and use searchsorted to pull out
> those for which that coordinate is within the cutoff. This should get
> you down to reasonably modest lists fairly quickly.
>
> There is of course a tradeoff between preprocessing time and lookup time.
>
> We seem to get quite a few posts from people wanting some kind of
> spatial data structure (whether they know it or not). Would it make
> sense to come up with some kind of compiled spatial data structure to
> include in scipy?
>

I think there should be a "computer science" module containing such things
as r-b trees, k-d trees, equivalence relations, and so forth. For this
problem one could probably make a low resolution cube of list objects and
index atoms into the appropriate list by severe rounding. I have done
similar things for indexing stars in a (fairly) wide field image and it
worked well. But I think the sorting approach would be a good first try
here. Argsort on the proper column followed by take would be the way to go
for that.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080710/1d30339b/attachment.html>