entropy

Mon Mar 15 12:44:51 EST 2004

I am trying to compute the entropy of a time series (eg,
http://en.wikipedia.org/wiki/Information_theory) using

S = - sum p_i log2(p_i)

According to the text I am using, the entropy of a gaussian
distribution should be 

1/2 log2(2 pi e sigma^2)

so I am using this result to test my algorithm.  Unfortunately, I am
not getting the results to agree.

Can anyone tell me where I am going wrong?

from Numeric import searchsorted, concatenate, arange, nonzero, log, \
     sum, multiply, sort, greater, take, pi, exp

from MLab import diff, randn

def hist(y, bins):
    n = searchsorted(sort(y), bins)
    n = diff(concatenate([n, [len(y)]]))
    return n

# generate some gaussian numbers
mu = 0.0
sigma = 2.0
x = mu + sigma*randn(100000)

delta = 0.001
bins = arange(-12.0, 12.0, delta)

n = hist(x, bins)

ind = nonzero(greater(n, 0.0))
n = take(n, ind)         # get the positive
n = 1.0/len(n)*n         # norm for probability; is this the right normalization
#n = 1.0/len(bins)*n     # or this? or something else?

Scomputed = -1.0/log(2.0) * sum(multiply(n, log(n)))
Sanalytic = 0.5/log(2.0) * log(2*pi*exp(1.0)*sigma**2)

print Scomputed, Sanalytic

Thanks!
John Hunter