[SciPy-User] What "Array" means
Bruce Southey
bsouthey at gmail.com
Tue Apr 12 13:12:55 EDT 2011
On Mon, Apr 11, 2011 at 3:08 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Fri, Apr 8, 2011 at 5:45 AM, <josef.pktd at gmail.com> wrote:
>> On Fri, Apr 8, 2011 at 6:14 AM, Timothy Wu <2huggie at gmail.com> wrote:
>>> Hi I am trying to run Scipy's D'Agostino's normality test as documented here
>>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.normaltest.html
>>>
>>> For the array argument I tried something like this
>>> scipy.array([1,2,3])
>>> or
>>> numpy.array([1,2,3])
>>>
>>> and axis ignored.
>>>
>>> But with both method the test fails:
>>>
>>> File "/usr/lib/python2.6/dist-packages/scipy/stats/mstats_basic.py", line
>>> 1546, in kurtosistest
>>> n = a.count(axis=axis).astype(float)
>>> AttributeError: 'int' object has no attribute 'astype'
>>>
>>> I'm not familiar with numpy nor scipy. What exactly should I put in there?
>>
>>
>> It looks like mstats.normaltest only works with 2-dimensional arrays,
>> stats.normaltest works with 1-dimensional arrays.
>>
>> rvs[:,None] in the example below adds an additional axis, so that it
>> is a column array with shape (20,1)
>> If you don't need the masked array version, then you can use stats.normaltest
>>
>> I haven't looked at the source yet, but this looks like a bug to me.
>>
>>>>> rvs = np.random.randn(20)
>>>>> rvs
>> array([ 0.02724005, -0.17836266, 0.40530377, 1.313246 , 0.74069068,
>> -0.69010129, -0.24958557, -2.28311759, 0.10525733, 0.07986322,
>> -0.87282545, -1.41364294, 1.16027037, 0.23541801, -0.06663458,
>> 0.39173207, 0.06979893, 0.4400277 , -1.29361117, -1.71524228])
>>>>> stats.normaltest(rvs)
>> (1.7052869564079727, 0.42628656195988301)
>>>>> stats.mstats.normaltest(rvs[:,None])
>> (masked_array(data = [1.70528695641],
>> mask = [False],
>> fill_value = 1e+20)
>> , masked_array(data = [ 0.42628656],
>> mask = False,
>> fill_value = 1e+20)
>> )
>>>>> stats.mstats.normaltest(rvs)
>>
>> Traceback (most recent call last):
>> File "<pyshell#58>", line 1, in <module>
>> stats.mstats.normaltest(rvs)
>> File "C:\Programs\Python27\lib\site-packages\scipy\stats\mstats_basic.py",
>> line 1642, in normaltest
>> k,_ = kurtosistest(a,axis)
>> File "C:\Programs\Python27\lib\site-packages\scipy\stats\mstats_basic.py",
>> line 1618, in kurtosistest
>> n = a.count(axis=axis).astype(float)
>> AttributeError: 'int' object has no attribute 'astype'
>>
>> Josef
>>
>
> Yes that is a bug so can someone create a ticket? (I don't have time today.)
> That occurs because ma.count() returns either an int (which causes the
> bug) or a ndarray. Actually that '.astype(float)' is probably not
> needed because as far as I can determine that every usage of an 'n' as
> an integer should still results in a float.
This is now ticket 1424 with a patch:
http://projects.scipy.org/scipy/ticket/1424
It did require a second change that I commented because the code needs
to index an array.
>
> There is also a second 'bug' because n must be greater than 3. I was
> looking for that because estimating kurtosis needs more than 3
> observations:
> "This bias-corrected formula requires that X contain at least four elements."
> http://www.mathworks.com/help/toolbox/stats/kurtosis.html
>
> This a different ticket because we need to catch the cases when only
> one particular 'column' has less than 4 but the other are fine.
>
>
>>>> rvs = np.random.randn(20,10)
>>>> stats.mstats.normaltest(rvs, axis=0)
> (masked_array(data = [0.713606808604 0.132722315345 7.78660833457
> 5.38597554393 0.725711290319
> 0.172342343314 4.02320908322 1.46363950653 3.79550214574 0.293759931912],
> mask = [False False False False False False False False
> False False],
> fill_value = 1e+20)
> , masked_array(data = [ 0.69991008 0.93579283 0.0203779 0.06767843
> 0.69568685 0.91743718
> 0.13377386 0.48103283 0.14990537 0.86339761],
> mask = False,
> fill_value = 1e+20)
> )
>>>> stats.mstats.normaltest(rvs, axis=1)
> (masked_array(data = [0.314582042621 0.4436261479 2.98149400163
> 2.02242070422 3.46138431999
> 9.94304440942 0.026055683609 5.7060731383 1.03808026381 0.169589515995
> 10.5681767508 1.28212296678 3.7013014714 0.43713740004 3.62659584833
> 0.289410600885 1.46353531025 0.745198884215 1.51022347547 0.00707268228071],
> mask = [False False False False False False False False
> False False False False
> False False False False False False False False],
> fill_value = 1e+20)
> , masked_array(data = [ 0.85445536 0.80106509 0.22520436 0.36377841
> 0.17716174 0.00693259
> 0.98705665 0.05766894 0.59509148 0.91870082 0.00507165 0.52673301
> 0.15713488 0.80366827 0.16311531 0.86527725 0.48105789 0.68894114
> 0.4699581 0.9964699 ],
> mask = False,
> fill_value = 1e+20)
> )
>>>> stats.mstats.normaltest(rvs, axis=None)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/lib64/python2.7/site-packages/scipy/stats/mstats_basic.py",
> line 1649, in normaltest
> k,_ = kurtosistest(a,axis)
> File "/usr/lib64/python2.7/site-packages/scipy/stats/mstats_basic.py",
> line 1625, in kurtosistest
> n = a.count(axis=axis).astype(float)
> AttributeError: 'int' object has no attribute 'astype'
>>>>
>
> That is because:
>>>> mrvs=rvs.view(ma.MaskedArray)
>>>> type(mrvs)
> <class 'numpy.ma.core.MaskedArray'>
>>>> type(mrvs.count(axis=0))
> <type 'numpy.ndarray'>
>>>> type(mrvs.count(axis=1))
> <type 'numpy.ndarray'>
>>>> type(mrvs.count(axis=None))
> <type 'int'>
>
>
> Bruce
>
This is now ticket 1425 with patches:
http://projects.scipy.org/scipy/ticket/1425
However the patch for mstats_basic.py does need some work. Basically
only those specific cases with less than 4 observations should be 0
not all cases.
Bruce
More information about the SciPy-User
mailing list