[SciPy-User] Weighted KDE
josef.pktd at gmail.com
josef.pktd at gmail.com
Sun May 13 14:17:30 EDT 2012
On Sun, May 13, 2012 at 1:07 PM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
> Hello all,
>
> A while ago, someone asked on this list about whether it would be simple to modify scipy.stats.kde.gaussian_kde to deal with weighted data:
> http://mail.scipy.org/pipermail/scipy-user/2008-November/018578.html
>
> Anne and Robert assured the writer that this was pretty simple (modulo bandwidth selection), though I couldn't find any code that the original author may have generated based on that advice.
>
> I've got a problem that could (perhaps) be solved neatly with weighed KDE, so I'd like to give this a go. I assume that at a minimum, to get basic gaussian_kde.evaluate() functionality:
>
> (1) The covariance calculation would need to be replaced by a weighted-covariance calculation. (Simple enough.)
>
> (2) In evaluate(), the critical part looks like this (and a similar stanza that loops over the points instead):
> # if there are more points than data, so loop over data
> for i in range(self.n):
> diff = self.dataset[:, i, newaxis] - points
> tdiff = dot(self.inv_cov, diff)
> energy = sum(diff*tdiff,axis=0) / 2.0
> result = result + exp(-energy)
>
> I assume that, further, the 'diff' values ought to be scaled by the weights, too. Is this all that would need to be done? (For the integration and resampling, obviously, there would be a bit of other work...)
it looks to me that way, scaled according to weight by dataset points
I don't see what the norm_factor should be:
self._norm_factor = sqrt(linalg.det(2*pi*self.covariance)) * self.n
there should be the weights somewhere in there, maybe just replace
self.n by sum(weights) given a constant covariance
sampling doesn't look difficult, if we want biased sampling, then
instead of randint, we would need weighted randint (non-uniform)
integration might require more work, or not (I never tried to understand them)
(I don't know if kde in statsmodels has weights on the schedule.)
Josef
mostly guessing
>
> Thanks,
> Zach
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list