On Wed, Aug 25, 2010 at 7:19 AM, John Hunter <jdh2358@gmail.com> wrote:
On Wed, Aug 25, 2010 at 9:10 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
How about using the percentiles of np.unique(x)? That takes care of the first constraint (no overlap) but ignores the second constraint (min std of cluster size).
Well, I need the 2nd constraint....
Both can't be hard constraints, so I guess the first step is to define a utility function that quantifies the trade off between the two. Would it make sense to then start from the percentile(unique(x), ...) solution and come up with a heuristic that moves an item with lots of repeats in a large length quintile to a short lenght quintile and then accept the moves if it improves the utility? Or try moving each item to each of the other 4 quintiles and do the move the improves the utility the most. Then repeat until the utility doesn't improve. But I guess I'm just stating the obvious and you are looking for something less obvious and more clever.