On 8/25/10 8:00 AM, John Hunter wrote:
Suppose I have an ordered list/array of numbers, and I want to split them into N chunks, such that the intersection of any chunk with each other is empty and the data is split as evenly as possible (eg the std dev of the lengths of the chunks is minimized or some other such criterion). Context: I am trying to do a quintile analysis on some data, and np.percentile doesn't behave like I want because more than 20% of my data equals 1, so 1 is in the first and second quintiles. I want to avoid this -- I'd rather have uneven counts in my quintiles than have the same value show up in multiple quintiles, but I'd like the counts to be as even as possible..
Here is some sample code that illustrates my problem:
....
John: This is a problem we have quite often analyzing precip data in arid regions - most of the time it just doesn't rain so the distribution has a delta function peak at zero. There is no good way around it. Sometimes people split up the sample into rain and no-rain, and treat the two distributions separately. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jeffrey.S.Whitaker@noaa.gov 325 Broadway Office : Skaggs Research Cntr 1D-113 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg