[SciPy-User] scipy.stats.mstats.linregress bug?
Andreas
lists at hilboll.de
Fri Jun 24 12:02:50 EDT 2011
>>> try to rescale, take away the e15, small numerical differences are
>>> possible because of the different way the results are calculated.
>>> There might still be a difference in the definition of the returns,
>>> but I haven't checked recently.
>>
>> Rescaling doesn't change a thing (see below). And, we're not talking
>> about
>> small numerical differences here. The problem is the last return value,
>> stderr. It differs by almost a factor 15!
>>
>> Cheers,
>> Andreas.
>>
>> In [15]: scipy.stats.linregress(x,data/1E15)
>> Out[15]:
>> (0.14916317817857139,
>> 4.8326781674166659,
>> 0.53093100793359616,
>> 0.041709303490157057,
>> 0.066031024254034967)
>>
>> In [16]: scipy.stats.mstats.linregress(x,data/1E15)
>> Out[16]:
>> (0.14916317817857139,
>> 4.8326781674166659,
>> 0.53093100793359627,
>> masked_array(data = 0.0417093034902,
>> mask = False,
>> fill_value = 1e+20)
>> ,
>> 1.0286155756515489)
>>
>>
>
> ma linregress
> sterrest = ma.sqrt(1.-r*r) * y.std()
>
> linregress
> sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)
So, why is it treated differently in the two functions that everyone would
expect to behave identically? What's the mathematical background. What's
ssym, ssxm, df?
And: Which one is a better estimate? (In my case, the stats.linregress one
seems to be a lot more reasonable ...)
Thanks for your insight!
Andreas.
More information about the SciPy-User
mailing list