Re: [SciPy-User] Weighted KDE

13 Jan 2013

      <josef.pktd <at> gmail.com> writes:
...
On Sun, May 13, 2012 at 1:07 PM, Zachary Pincus <zachary.pincus <at> yale.edu>
...
...
Hello all,
A while ago, someone asked on this list about whether it would be simple to 
modify
scipy.stats.kde.gaussian_kde to deal with weighted data:
http://mail.scipy.org/pipermail/scipy-user/2008-November/018578.html
Anne and Robert assured the writer that this was pretty simple (modulo 
bandwidth selection), though I
couldn't find any code that the original author may have generated based on
...
...
I've got a problem that could (perhaps) be solved neatly with weighed KDE,
so I'd like to give this a go. I
assume that at a minimum, to get basic gaussian_kde.evaluate() functionality:
...
(1) The covariance calculation would need to be replaced by a weighted-
covariance calculation. (Simple enough.)
...
(2) In evaluate(), the critical part looks like this (and a similar stanza
wrote:
that advice.
that loops over the points instead):
...
...
# if there are more points than data, so loop over data
for i in range(self.n):
   diff = self.dataset[:, i, newaxis] - points
   tdiff = dot(self.inv_cov, diff)
   energy = sum(diff*tdiff,axis=0) / 2.0
   result = result + exp(-energy)
I assume that, further, the 'diff' values ought to be scaled by the weights, 
too. Is this all that would need
to be done? (For the integration and resampling, obviously, there would be a 
bit of other work...)
it looks to me that way, scaled according to weight by dataset points
I don't see what the norm_factor should be:
      self._norm_factor = sqrt(linalg.det(2*pi*self.covariance)) * self.n
there should be the weights somewhere in there, maybe just replace
self.n by sum(weights) given a constant covariance
sampling doesn't look difficult, if we want biased sampling, then
instead of randint, we would need weighted randint (non-uniform)
integration might require more work, or not (I never tried to understand them)
(I don't know if kde in statsmodels has weights on the schedule.)
Josef
mostly guessing
...
Thanks,
Zach
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Hi,

I am facing the same problem as well, but can't figure out how the weighting 
should be done exactly.

Has anybody successfully completed the modification of the code to allow a 
weighted kde? I am attempting to perform kde on a set of imaging data with X, Y, 
and an additional "temperature" column.

Performing the kde on only the X,Y axes gives a working heatmap showing the 
spatial distribution of the data points, but I would also like to use them to 
see the "temperature" profile (the third axis), much like a geographical heatmap 
showing temperature or rainfall values over a X-Y map.

I found another set of code from 
http://pastebin.com/LNdYCZgw
which allows weighted kde, but when I tried it out with my data, it took much 
longer than the normal kde (>1 hour) when the original code took only a about 
twenty seconds (despite claims that it was faster). 

Thanks,
Jackson

Re: [SciPy-User] Weighted KDE

Jackson Li