[SciPy-User] scipy.stats.mstats.linregress bug?
josef.pktd at gmail.com
josef.pktd at gmail.com
Fri Jun 24 11:59:26 EDT 2011
On Fri, Jun 24, 2011 at 12:02 PM, Andreas <lists at hilboll.de> wrote:
>>>> try to rescale, take away the e15, small numerical differences are
>>>> possible because of the different way the results are calculated.
>>>> There might still be a difference in the definition of the returns,
>>>> but I haven't checked recently.
>>>
>>> Rescaling doesn't change a thing (see below). And, we're not talking
>>> about
>>> small numerical differences here. The problem is the last return value,
>>> stderr. It differs by almost a factor 15!
>>>
>>> Cheers,
>>> Andreas.
>>>
>>> In [15]: scipy.stats.linregress(x,data/1E15)
>>> Out[15]:
>>> (0.14916317817857139,
>>> 4.8326781674166659,
>>> 0.53093100793359616,
>>> 0.041709303490157057,
>>> 0.066031024254034967)
>>>
>>> In [16]: scipy.stats.mstats.linregress(x,data/1E15)
>>> Out[16]:
>>> (0.14916317817857139,
>>> 4.8326781674166659,
>>> 0.53093100793359627,
>>> masked_array(data = 0.0417093034902,
>>> mask = False,
>>> fill_value = 1e+20)
>>> ,
>>> 1.0286155756515489)
>>>
>>>
>>
>> ma linregress
>> sterrest = ma.sqrt(1.-r*r) * y.std()
>>
>> linregress
>> sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)
>
> So, why is it treated differently in the two functions that everyone would
> expect to behave identically? What's the mathematical background. What's
> ssym, ssxm, df?
>
> And: Which one is a better estimate? (In my case, the stats.linregress one
> seems to be a lot more reasonable ...)
stats.stats reports the stderror of the estimate of the slope parameter b
stats.mstats reports the stderror of the regression error/residual) y - (a + bx)
stats.stats got changed by accident, and mstats didn't follow.
Josef
>
> Thanks for your insight!
> Andreas.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list