[SciPy-User] An extra parameter to stats.chisquare ?

Mon Aug 3 15:27:17 EDT 2009

On Aug 3, 2009, at 3:10 PM, josef.pktd at gmail.com wrote:
>>> def chisquare(f_obs, f_exp=None, ddof=0):
>>>      ....
>>>      return chisq, chisqprob(chisq, k-1-ddof)
>>>
>>> default is when no parameters are estimated (dof=k-1), e.g. create
>>> random sample and compare to distribution with *given* parameters.
>>
>> Looks cool.
>
> I will prepare the change and check whether chisquare is fully tested

OK, fab'.

>>>
>>> The main point of the function is to do a equal weight binning, to
>>> maintain a minimum expected frequency in each cell, which is
>>> recommended (>=5 expected observations for the chisquare  
>>> distribution
>>> to be an appropriate approximation).
>>
>> Bah, this binning doesn't really matter when you wanna use X2 to
>> compare a sample to an actual distribution, does it ?
>
> Yes it does.
> If there is not a minimum number of expected frequency counts in each
> cell, then the chisquare distribution is not a good approximation for
> the distribution of the test statistic.

Well, like I said, I'm no statistician by trade. So OK

> In your book example the expected cell count is around 8. If there
> were fewer observations so that the expected cell count drops below 5,
> then the literature recommends combining cells.
>
> The worse case are discrete distributions with unbound support, e.g.
> poisson, then there will always be integers in the tail(s) without
> observations, and observations have to be binned. Similar the
> continuous distribution case, you cannot compare the pdf/pmf pointwise
> in the chisquare test if the probability of each point is very small.

Dang. More to read. See, this kind of info would be great in a  
documentation ;)