[SciPy-User] scipy.stats.mstats.linregress bug?

Skipper Seabold jsseabold at gmail.com
Fri Jun 24 12:32:43 EDT 2011


On Fri, Jun 24, 2011 at 12:28 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, Jun 24, 2011 at 12:10 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>> On Fri, Jun 24, 2011 at 11:59 AM,  <josef.pktd at gmail.com> wrote:
>>> On Fri, Jun 24, 2011 at 12:02 PM, Andreas <lists at hilboll.de> wrote:
>>>>>>> try to rescale, take away the e15, small numerical differences are
>>>>>>> possible because of the different way the results are calculated.
>>>>>>> There might still be a difference in the definition of the returns,
>>>>>>> but I haven't checked recently.
>>>>>>
>>>>>> Rescaling doesn't change a thing (see below). And, we're not talking
>>>>>> about
>>>>>> small numerical differences here. The problem is the last return value,
>>>>>> stderr. It differs by almost a factor 15!
>>>>>>
>>>>>> Cheers,
>>>>>> Andreas.
>>>>>>
>>>>>> In [15]: scipy.stats.linregress(x,data/1E15)
>>>>>> Out[15]:
>>>>>> (0.14916317817857139,
>>>>>>  4.8326781674166659,
>>>>>>  0.53093100793359616,
>>>>>>  0.041709303490157057,
>>>>>>  0.066031024254034967)
>>>>>>
>>>>>> In [16]: scipy.stats.mstats.linregress(x,data/1E15)
>>>>>> Out[16]:
>>>>>> (0.14916317817857139,
>>>>>>  4.8326781674166659,
>>>>>>  0.53093100793359627,
>>>>>>  masked_array(data = 0.0417093034902,
>>>>>>             mask = False,
>>>>>>       fill_value = 1e+20)
>>>>>> ,
>>>>>>  1.0286155756515489)
>>>>>>
>>>>>>
>>>>>
>>>>> ma linregress
>>>>> sterrest = ma.sqrt(1.-r*r) * y.std()
>>>>>
>>>>> linregress
>>>>> sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)
>>>>
>>>> So, why is it treated differently in the two functions that everyone would
>>>> expect to behave identically? What's the mathematical background. What's
>>>> ssym, ssxm, df?
>>>>
>>>> And: Which one is a better estimate? (In my case, the stats.linregress one
>>>> seems to be a lot more reasonable ...)
>>>
>>> stats.stats reports the stderror of the estimate of the slope parameter b
>>> stats.mstats reports the stderror of the regression error/residual) y - (a + bx)
>>>
>>
>> It's a biased estimate in mstats as well by the look of it?
>>
>>> stats.stats got changed by accident, and mstats didn't follow.
>>>
>>
>> Either way, the docs need to be fixed at the least.
>
> It might be in last years stats sprint. Do you know what happened to
> the repo, I lost sight of it?
>

Oh, yeah. Hmm, I don't know whose github account it was under, but I
have a local repo on an external somewhere that I can look for.

Skipper



More information about the SciPy-User mailing list