[scikit-learn] KernelDensity bandwidth hyper parameter optimization

Andreas Mueller t3kcit at gmail.com
Wed Nov 7 18:05:59 EST 2018



On 11/7/18 4:01 AM, William Heymann wrote:
> Hello,
>
> I am trying to tune the bandwidth for my KernelDensity. I need to find 
> out what optimization goal to use.
>
> I started with
>
> from  sklearn.grid_search  import  GridSearchCV
> grid  =  GridSearchCV(KernelDensity(),
>                      {'bandwidth':  np.linspace(0.1,  1.0,  30)},
>                      cv=20)  # 20-fold cross-validation
> grid.fit(x[:,  None])
> print  grid.best_params_
>
> From 
> https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/#Bandwidth-Cross-Validation-in-Scikit-Learn
>
> I have also used RandomizedSearchCV to optimize the parameters.
>
> The problem I have is that neither refines the answer so if I don't 
> sample at high enough density I don't get a good answer. What I would 
> like to do is use the same goal but put it into a different global 
> optimizer.
>
> I have looked through the code for GridSearchCV and RandomizedSearchCV 
> and I have not been able to figure out yet what is the actual 
> optimization goal.
>
> Originally I thought the system was using something like
>
> kde_bw = KernelDensity(kernel='gaussian', bandwidth=bw)
> score = max(cross_val_score(kde_bw, data, cv=3))
>
That's basically what it's doing. It's maximizing the "score" method of 
KernelDensity.
you could look at scikit-optimize for a more elaborate optimizer (or try 
using any of the scipy ones)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181107/b0756816/attachment.html>


More information about the scikit-learn mailing list