hi all,
in my code, i use the function 'logsumexp' from scipy.maxentropy a
lot. as far as i can tell, this function has no vectorized version
that works on an m-x-n matrix. i might be doing something wrong here,
but i found that this function can run extremely slowly if used as
follows: i have an array of log probability vectors, such that each
column sums to one. i want to simply iterate over each column and
renormalize it, using exp(col - logsumexp(col)). here is the code that
i used to profile this operation:
from scipy import *
from numpy import *
from numpy.random.mtrand import dirichlet
from scipy.maxentropy import logsumexp
import time
# build an array of probability vectors. each column represents a
probability vector.
num_vectors = 1000000
log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
# now renormalize each column, using logsumexp
norm_prob_vectors = []
t1 = time.time()
for n in range(num_vectors):
norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
norm_prob_vectors.append(norm_p)
t2 = time.time()
norm_prob_vectors = array(norm_prob_vectors)
print "logsumexp renormalization (%d many times) took %s seconds."
%(num_vectors, str(t2-t1))
i found that even with only 100,000 elements, this code takes about 5 seconds:
logsumexp renormalization (100000 many times) took 5.07085394859 seconds.
with 1 million elements, it becomes prohibitively slow:
logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.
is there a way to speed this up? most vectorized operations that work
on matrices in numpy/scipy are incredibly fast and it seems like a
vectorized version of logsumexp should be near instant on this scale.
is there a way to rewrite the above snippet so that it's faster?
thanks very much for your help.