[scikit-learn] Search in results for optimal parameter subsets

Mon Oct 17 07:36:32 EDT 2016

Hey, I have dataframe with test results (10k rows). Each result (row)
has ~6 parameters, plus some output metrics. I would like to find
combinations of the parameters which have reasonable mean, std and
support-count (number of results in the configuration).

E.g. if there are parameters "k" and "n", each in range(100), and the
result metric has good mean and std for "k in [4..12] and c in
[90..95]" (support-count for this would be 8*5 = 40) and then maybe "k
in [34..41] and c in [10..13] (s-c is 7*3=21), then I would like to
have the algorithm return sth like

k       c       mean   std    support-count total_score
4..12   90..95  12.1   1.23   40            9.3
34..41  10..13  11.1   1.13   21            6.2

I understand I will first have to define a fucntion that will reduce
the mean, std and count to the total_score. I can do that somehow. But
I don't know what kind of math task is finding the local maxima of
parameter configuration subsets.

Is this optimization task? Can you please point me to sth in sklearn
or scipy, that would give me some direction?

Cheers,
Tomas