[Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

Wed Oct 4 11:36:09 EDT 2006

Hi,
I think what you are after is the multivariate normal distribution.
Assuming I have it correctly, it is clearer to see (and probably more
accurate to compute) in the log form as:

-(N/2)*log(2*PI) - 0.5*log(determinant of V) - 0.5*(transpose of
(x-mu))*inverse(V)*(x-mu)

where N is the number of observations, PI is math constant, V is the
known variance co-variance matrix, x is vector of values,  mu is the
known mean.

If so, then you can vectorize the density calculation. Of course if V
has a simple structure (it is unclear from your code on this) and can
be factored out, then the only 'hard' part to compute is the product
of the vector transpose with the vector. If mu and va are scalars for
each calculation, then you can factor them out for so the only
multidimensional calculations are squares and sums of each frame.

Regards
Bruce

On 10/4/06, David Cournapeau <david at ar.media.kyoto-u.ac.jp> wrote:
> Tim Hochberg wrote:
> >>
> >>
> >> I guess the problem is coming from the fact that y being C order, y[:,
> >> i] needs accessing data in a non 'linear' way. Is there a way to speed
> >> this up ? I did something like this:
> >>
> >>   y   = N.zeros((K, n))
> >>     for i in range(K):
> >>         y[i] = gauss_den(data, mu[i, :], va[i, :])
> >>     return y.T
> >>
> >> which works, but I don't like it very much.
> > Why not?
> >
> Mainly because using those transpose do not really reflect the
> intention, and this does not seem natural.
> >>  Isn't there any other way
> > That depends on the details of gauss_den.
> >
> > A general note: for this kind of microoptimization puzzle, it's much
> > easier to help if you can post a self contained example, preferably
> > something fairly simple that still illustrates the speed issue, that we
> > can experiment with.
> >
> Here we are (the difference may not seem that much between the two
> multiple_ga, but in reality, _diag_gauss_den is an internal function
> which is done in C, and is much faster... By writing this example, I've
> just realized that the function _diag_gauss_den may be slow for exactly
> the same reasons):
>
>
> #! /usr/bin/env python
> # Last Change: Wed Oct 04 07:00 PM 2006 J
>
> import numpy as N
> from numpy.random import randn
>
> def _diag_gauss_den(x, mu, va):
>     """ This function is the actual implementation
>     of gaussian pdf in scalar case. It assumes all args
>     are conformant, so it should not be used directly
>
>     Call gauss_den instead"""
>     # Diagonal matrix case
>     d       = mu.size
>     inva    = 1/va[0]
>     fac     = (2*N.pi) ** (-d/2.0) * N.sqrt(inva)
>     y       =  (x[:,0] - mu[0]) ** 2 * inva * -0.5
>     for i in range(1, d):
>         inva    = 1/va[i]
>         fac     *= N.sqrt(inva)
>         y       += (x[:,i] - mu[i]) ** 2 * inva * -0.5
>     y   = fac * N.exp(y)
>
> def multiple_gauss_den1(data, mu, va):
>     """Helper function to generate several Gaussian
>     pdf (different parameters) from the same data: unoptimized version"""
>     K   = mu.shape[0]
>     n   = data.shape[0]
>     d   = data.shape[1]
>
>     y   = N.zeros((n, K))
>     for i in range(K):
>         y[:, i] = _diag_gauss_den(data, mu[i, :], va[i, :])
>
>     return y
>
> def multiple_gauss_den2(data, mu, va):
>     """Helper function to generate several Gaussian
>     pdf (different parameters) from the same data: optimized version"""
>     K   = mu.shape[0]
>     n   = data.shape[0]
>     d   = data.shape[1]
>
>     y   = N.zeros((K, n))
>     for i in range(K):
>         y[i] = _diag_gauss_den(data, mu[i, :], va[i, :])
>
>     return y.T
>
> def bench():
>     #===========================================
>     # GMM of 30 comp, 15 dimension, 1e4 frames
>     #===========================================
>     d       = 15
>     k       = 30
>     nframes = 1e4
>     niter   = 10
>     mode    = 'diag'
>
>     mu      = randn(k, d)
>     va      = randn(k, d) ** 2
>     X       = randn(nframes, d)
>
>     print "============================================================="
>     print "(%d dim, %d components) GMM with %d iterations, for %d frames" \
>             % (d, k, niter, nframes)
>
>     for i in range(niter):
>         y1  = multiple_gauss_den1(X, mu, va)
>
>     for i in range(niter):
>         y2  = multiple_gauss_den2(X, mu, va)
>
>     se  = N.sum(y1-y2)
>
>     print se
>
> if __name__ == '__main__':
>     import hotshot, hotshot.stats
>     profile_file    = 'foo.prof'
>     prof    = hotshot.Profile(profile_file, lineevents=1)
>     prof.runcall(bench)
>     p = hotshot.stats.load(profile_file)
>     print p.sort_stats('cumulative').print_stats(20)
>     prof.close()
>
> I am a bit puzzled by all those C vs F storage, though. In Matlab, where
> the storage was always F as far as I know, I have never encountered such
> differences (eg between y(:, i) and y(:, i)); I don't know if this is
> because I am doing it badly, or because matlab is much more clever than
> numpy at handling those cases, or if that is the price to pay for the
> added flexibility of numpy...
>
> David
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>