[Numpy-discussion] fast way of doing "cross-multiplications" ?

Thu Jul 20 19:35:35 EDT 2006

Fernando Perez wrote:
> On 7/18/06, Tim Hochberg <tim.hochberg at cox.net> wrote:
>> Eric Emsellem wrote:
>> > thanks for the tips. (indeed your "add.reduce" is correct: I just 
>> wrote
>> > this down too quickly, in the script I have a "sum" included).
>> >
>> > And yes you are right for the memory issue, so I may just keep the 
>> loop
>> > in and try to make it work on a fast PC...(or use parallel processes)
>> >
>> > (is "sum" different than "add.reduce"?)
>> >
>> > thanks again to both Bill Baxter and Perry Greenfield for their fast
>> > (and helpful!) answers.
>> >
>> I just wanted to add that there are faster, but considerably complicated
>> ways to attack this class of problems. The one I've looked at in the
>> past was the fast multipole method and I believe there are others. I'm
>> not sure whether these can be implemented efficiently in numpy, but you
>> may want to take a look into this kind of more sophisticated/complicated
>> approach if brute forcing the calculation doesn't work.
>
> Indeed, FMM is the best known method that can turn this O(n^2) problem
> into O(n*log(n)).  As Tim indicates, there is no easy way out of this
> one.  Incidentally, there is a talk about FMM on the scipy'06
> schedule, in case you are going to attend.
>
> An alternative approach to FMM (conceptually similar in some ways) is
> described here:
>
> http://amath.colorado.edu/faculty/fperez/talks/0604_sanum_mwadap.pdf
>
> Unfortunately this isn't exactly a half-hour optimization effort, as
> the required machinery is pretty heavy duty.  And yes, this has all
> been done in python and runs on current numpy/scipy (though it has
> Fortran, C and C++ sprinkled as needed).
I'm interested in hearing in what situations people end up dropping into 
C/C++ or Fortran. It would be interesting to see if general solutions 
could be found for making some of them fast enough at the Python level. 
Here are some situations that I've run into where numpy has trouble 
computing results efficiently.

   1. The value at a given point is the computed one of two (or more)
      ways depending on some condition. The subcomputations are
      expensive, so one does not want to compute them both and then
      chose a result using 'where'.  I think I have a decent way to
      attack this at the Python level using argsort: basically, sort all
      of the operands, find the boundaries between the regions where
      different computations are done, compute the expensive
      subcomputations on the appropriate blocks and then unsort the
      data. It would be interesting to see if this could be automated in
      a clean way, perhaps within numexpr.
   2. Generalized reduce. We want to reduce an array using an arbitrary
      function f(x, y) that is not a standard ufunc. In the particular
      case I've run into, x and y would be vectors or perhaps record
      arrays, and the functions would couple the various records when
      producing the output.  I don't see how this case can be handled
      efficiently at the Python level, short of using something like
      Weave to produce a custom ufunc and calling reduce on it. I don't
      see how something like numexpr could handle it, unfortunately,
      since it gains it's speed by operating on blocks of data.

I'm sure there are other cases that I've run into, but that's all I'm 
coming up with right now.

-tim