Hello, I have some XY data. I would like to generate the equations for an upper and lower envelope that excludes a percentage of the data points. I would like to define the slope of the envelope line (say 3) and then have my code find the intercept that fits my requirements (say 5% of data below the lower envelope). This would then give me the equation and I could plot the upper and lower envelopes. I hope this makes sense. Thanks for any help.
Bevan, You can estimate the intercept and slope using least-squares (scipy.optimize.leastsq). Make sure though that errors in X are small compared to errors in Y, otherwise, your slope will be underestimated. Using the slope, you can write a function lower(b,a, X,Y) that will compute y=aX+b and return True if Y < y. Computing the ratio of true elements will give you the percentage of points below the curve. You can then find b such that the ratio is .5 and .95 using scipy.optimize.fmin. There are other ways to do this; Make a 2D histogram of the data (normed), compute the cumulative sum along Y and find the histogram bins (along x) such that the cumulative histogram is approximately equal to .5 and .95. Partition the data in N sets along the x-axis, fit a normal distribution to each set and compute the quantile corresponding to .5 and .95 cumulative probability density. David By the way, anonymous mails from newcomers don't get as much attention as those that are signed. Call it mailing list etiquette. On Tue, Sep 30, 2008 at 5:06 AM, bevan <bevan07@gmail.com> wrote:
Hello,
I have some XY data. I would like to generate the equations for an upper and lower envelope that excludes a percentage of the data points.
I would like to define the slope of the envelope line (say 3) and then have my code find the intercept that fits my requirements (say 5% of data below the lower envelope). This would then give me the equation and I could plot the upper and lower envelopes.
I hope this makes sense. Thanks for any help.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
2008/9/30 bevan <bevan07@gmail.com>:
Hello,
I have some XY data. I would like to generate the equations for an upper and lower envelope that excludes a percentage of the data points.
I would like to define the slope of the envelope line (say 3) and then have my code find the intercept that fits my requirements (say 5% of data below the lower envelope). This would then give me the equation and I could plot the upper and lower envelopes.
I hope this makes sense. Thanks for any help.
For this particular problem - where you know the slope - it's not too hard. If the slope is b, and your points are x and y, compute y-b*x, then sort that array, and choose the 5th and 95th percentile values. Anne
Anne Archibald <peridot.faceted <at> gmail.com> writes:
2008/9/30 bevan <bevan07 <at> gmail.com>:
Hello,
I have some XY data. I would like to generate the equations for an upper
and
lower envelope that excludes a percentage of the data points.
I would like to define the slope of the envelope line (say 3) and then have my code find the intercept that fits my requirements (say 5% of data below the lower envelope). This would then give me the equation and I could plot the upper and lower envelopes.
I hope this makes sense. Thanks for any help.
For this particular problem - where you know the slope - it's not too hard. If the slope is b, and your points are x and y, compute y-b*x, then sort that array, and choose the 5th and 95th percentile values.
Anne
David and Anne, Thanks for your help. I first tried your suggestion David but did not get it working by the time Anne's post arrived. I plan on trying to get the optimize.fmin to work as i can see some cool uses for it in the future. I managed to get Anne's suggestion to work relatively quickly (only slowed by the dullard operating the machine...) Apologies for the lack of signature in my first post and thanks again. My code (for what its worth) is as follows: import numpy as np import pylab from scipy import stats,polyval #If the slope is b, and your points are x and y, compute y-b*x, #then sort that array, and choose the 5th and 95th percentile values. def Envelope(x,y,slpe,percntExclude): ans = y-slpe*x ans.sort() intercpt =stats.scoreatpercentile(ans,percntExclude) return intercpt slpeUpper = 1.0 slpeLower = 3.0 percntUpper = 95 percntLower = 5 x = np.random.rand(100) y = x*3+1.5+ np.random.rand(100) #Linear regression using stats.linregress #Returns: slope, intercept, r, two-tailed prob, stderr-of-the-estimate slpe,intercpt,r,tt,stderr=stats.linregress(x,y) yRegress=polyval([slpe,intercpt],x) intercptLower = Envelope(x,y,slpeLower,percntLower) intercptUpper = Envelope(x,y,slpeUpper,percntUpper) yLower = polyval([slpeLower,intercptLower],x) yUpper = polyval([slpeUpper,intercptUpper],x) pylab.figure() pylab.plot(x,y,'k.') pylab.plot(x,yRegress,'b-') pylab.plot(x,yLower,'r--') pylab.plot(x,yUpper,'r--') pylab.show() Bevan Jenkins
On Tue, Sep 30, 2008 at 4:37 PM, Anne Archibald <peridot.faceted@gmail.com>wrote:
Hello,
I have some XY data. I would like to generate the equations for an upper and lower envelope that excludes a percentage of the data points.
I would like to define the slope of the envelope line (say 3) and then have my code find the intercept that fits my requirements (say 5% of data below
lower envelope). This would then give me the equation and I could plot
2008/9/30 bevan <bevan07@gmail.com>: the the
upper and lower envelopes.
I hope this makes sense. Thanks for any help.
For this particular problem - where you know the slope - it's not too hard. If the slope is b, and your points are x and y, compute y-b*x, then sort that array, and choose the 5th and 95th percentile values.
That's a pretty elegant solution. Thanks for sharing, David
Anne _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
Anne Archibald
-
bevan
-
Bevan
-
David Huard