Question to Travis: what is rdist about?
Travis, I wonder if you get a moment and desire to give a bit of theory/history for the hungry people ;) In a recent thread (a part of it is below the body of this email) dealing with instabilities of rdist Josef asked what is the application domain of rdist distribution... he heard about relation to correlation, I mentioned that it is related to the distribution of a coordinate of points on c-dimensional sphere. But I wonder -- what was the original reason for this distribution to appear? where have you found it, or in other words -- what literature source describes it? thanks to git I found that you introduced it in commit 8ce8603696448c171c186ea2aab158cf34e25441 Author: travo <travo@d6536bca-fef9-0310-8506-e4c0a848fbcf> Date: Fri Nov 22 09:04:46 2002 +0000 Changed statistics module to use clasasses. git-svn-id: http://svn.scipy.org/svn/scipy/trunk@648 d6536bca-fef9-0310-8506-e4c0a848fbcf but I can't figure out if it was really a new distribution or refactored from some other one. Thank you in advance! On Fri, 13 Mar 2009, Yaroslav Halchenko wrote:
Google search for r distribution is pretty useless, and I have not yet found a reference or an explanation of the rdist and its uses. there was just a single page which I ran to which described rdist and plotted sample pdfs. but can't find it now I read somewhere, I don't remember where that rdist is the distribution of the correlation coefficient, but without more information that's pretty useless doh! sure it is related... hence the name rdist, since pearsons corr coeff is abbreviated as 'r' ;) hence rdist ;)
http://en.wikipedia.org/wiki/Correlation_coefficient says that The distribution of the correlation coefficient has been examined by R. A. Fisher[2][3] and A. K. Gayen.[4]
but those are 100 and 50 years old books... not sure if we have them online to check if they were the one who brought analytic function for it...
and it seems that it is related to the 'multidimensional' correlation mentioned in the wikipedia but it is now clear how sample size "fits into equation"... c seems to relate to the dimensions of the data...
is it possible to trace back who introduced this lovely piece into scipy? ;) may be we could ask the author? ;) -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555]
Hi, I presume it refers to the correlation distribution. The pdf is that given at: http://www.xycoon.com/rdis_density.htm where scipy.stats c variable is equal to n-2 in that formula. You can find things if you look for correlation test. Bruce Yaroslav Halchenko wrote:
Travis,
I wonder if you get a moment and desire to give a bit of theory/history for the hungry people ;)
In a recent thread (a part of it is below the body of this email) dealing with instabilities of rdist Josef asked what is the application domain of rdist distribution... he heard about relation to correlation, I mentioned that it is related to the distribution of a coordinate of points on c-dimensional sphere. But I wonder -- what was the original reason for this distribution to appear? where have you found it, or in other words -- what literature source describes it?
thanks to git I found that you introduced it in
commit 8ce8603696448c171c186ea2aab158cf34e25441 Author: travo <travo@d6536bca-fef9-0310-8506-e4c0a848fbcf> Date: Fri Nov 22 09:04:46 2002 +0000
Changed statistics module to use clasasses.
git-svn-id: http://svn.scipy.org/svn/scipy/trunk@648 d6536bca-fef9-0310-8506-e4c0a848fbcf
but I can't figure out if it was really a new distribution or refactored from some other one.
Thank you in advance!
On Fri, 13 Mar 2009, Yaroslav Halchenko wrote:
Google search for r distribution is pretty useless, and I have not yet found a reference or an explanation of the rdist and its uses.
there was just a single page which I ran to which described rdist and plotted sample pdfs. but can't find it now
I read somewhere, I don't remember where that rdist is the distribution of the correlation coefficient, but without more information that's pretty useless
doh! sure it is related... hence the name rdist, since pearsons corr coeff is abbreviated as 'r' ;) hence rdist ;)
http://en.wikipedia.org/wiki/Correlation_coefficient says that The distribution of the correlation coefficient has been examined by R. A. Fisher[2][3] and A. K. Gayen.[4]
but those are 100 and 50 years old books... not sure if we have them online to check if they were the one who brought analytic function for it...
and it seems that it is related to the 'multidimensional' correlation mentioned in the wikipedia but it is now clear how sample size "fits into equation"... c seems to relate to the dimensions of the data...
is it possible to trace back who introduced this lovely piece into scipy? ;) may be we could ask the author? ;)
Yaroslav Halchenko wrote:
Travis,
I wonder if you get a moment and desire to give a bit of theory/history for the hungry people ;)
Thanks for emailing me directly. Unfortunately, I don't get the time to read all of SciPy-dev anymore. These are the references I used in constructing the distributions (they are comments in the code). ## References:: ## Documentation for ranlib, rv2, cdflib and ## ## Eric Wesstein's world of mathematics http://mathworld.wolfram.com/ ## http://mathworld.wolfram.com/topics/StatisticalDistributions.html ## ## Documentation to Regress+ by Michael McLaughlin ## ## Engineering and Statistics Handbook (NIST) ## http://www.itl.nist.gov/div898/handbook/index.htm ## ## Documentation for DATAPLOT from NIST ## http://www.itl.nist.gov/div898/software/dataplot/distribu.htm ## ## Norman Johnson, Samuel Kotz, and N. Balakrishnan "Continuous ## Univariate Distributions", second edition, ## Volumes I and II, Wiley & Sons, 1994. The rdist distribution appeared at the same time as a lot of other distributions. It must be referred to in one of the above sources. But, here is a decent current source: http://demonstrations.wolfram.com/TheRDistribution/ From the text (some of the math images disappeared): The r-distribution with parameter is the distribution of the correlation coefficient of a random sample of size drawn from a bivariate normal distribution with . It can be used to construct tests about the correlation coefficient of bivariate normal data; that is, tests with null hypothesis . The mean of the distribution is always zero, and as the sample size grows, the distribution's mass concentrates more closely about this mean. Thanks for pushing hard against SciPy --- it's the only way to ferret out the problems that exist. Best regards, -Travis -- Travis Oliphant Enthought, Inc. (512) 536-1057 (office) (512) 536-1059 (fax) http://www.enthought.com oliphant@enthought.com
participants (3)
-
Bruce Southey
-
Travis E. Oliphant
-
Yaroslav Halchenko