[Numpy-discussion] newbie question - large dataset

Sat Apr 7 16:22:11 EDT 2007

On 4/7/07, Stefan van der Walt <stefan at sun.ac.za> wrote:
> On Sat, Apr 07, 2007 at 02:48:47PM -0400, Anne Archibald wrote:
> > If none of those algorithmic improvements are possible, you can look
> > at other possibilities for speeding things up (though the speedups
> > will be modest). Parallelism is an obvious one - if you've got a
> > multicore machine you may be able to cut your processing time by a
> > factor of the number of cores you have available with minimal effort
> > (for example by replacing a for loop with a simple foreach,
> > implemented as in the attached file).
>
> Would this code speed things up under Python?  I was under the
> impression that there is only one process, irrespective of whether or
> not "threads" are used, and that the global interpreter lock is used
> when swapping between threads to make sure that only one executes at
> any instance in time.

You are correct.  If g,h in the OP's description satisfy:

a) they are bloody expensive

b) they release the GIL internally via the proper C API calls, which
means they are promising not to modify any shared python objects

the pure python threads approach could help *somewhat*.

But yes, for this kind of distribution problem in python, a
multi-process approach is probably a better approach, if parallelism
is going to be used.

I suspect, however, that trying to lower the quadratic complexity of
the OP's formulation in the first place is probably a better idea.
Distribution lowers the constants, not the asymptotic behavior; as
Anne accurately pointed out, this is much more of an algorithmic
question.

Cheers,

f