[SciPy-User] scipy.stats.mstats.linregress bug?

Andreas lists at hilboll.de
Fri Jun 24 12:02:50 EDT 2011


>>> try to rescale, take away the e15, small numerical differences are
>>> possible because of the different way the results are calculated.
>>> There might still be a difference in the definition of the returns,
>>> but I haven't checked recently.
>>
>> Rescaling doesn't change a thing (see below). And, we're not talking
>> about
>> small numerical differences here. The problem is the last return value,
>> stderr. It differs by almost a factor 15!
>>
>> Cheers,
>> Andreas.
>>
>> In [15]: scipy.stats.linregress(x,data/1E15)
>> Out[15]:
>> (0.14916317817857139,
>>  4.8326781674166659,
>>  0.53093100793359616,
>>  0.041709303490157057,
>>  0.066031024254034967)
>>
>> In [16]: scipy.stats.mstats.linregress(x,data/1E15)
>> Out[16]:
>> (0.14916317817857139,
>>  4.8326781674166659,
>>  0.53093100793359627,
>>  masked_array(data = 0.0417093034902,
>>             mask = False,
>>       fill_value = 1e+20)
>> ,
>>  1.0286155756515489)
>>
>>
>
> ma linregress
> sterrest = ma.sqrt(1.-r*r) * y.std()
>
> linregress
> sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)

So, why is it treated differently in the two functions that everyone would
expect to behave identically? What's the mathematical background. What's
ssym, ssxm, df?

And: Which one is a better estimate? (In my case, the stats.linregress one
seems to be a lot more reasonable ...)

Thanks for your insight!
Andreas.




More information about the SciPy-User mailing list