[SciPy-User] Are the scores normally distributed?
Bakary N'tji Diallo
diallobakary4 at gmail.com
Fri Dec 21 03:55:41 EST 2018
I am reading this: "With large enough sample sizes (> 30 or 40), the
violation of the normality assumption should not cause major problems (4);
this implies that we can use parametric procedures even when the data are
not normally distributed (8)."
from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693611/
Can I then use the normalization procedure given the large sample size? The
normalization is simply calculating the zscore as in (here
<https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.zscore.html>)
using the mean and standard deviation.
Le sam. 8 déc. 2018 à 06:52, Bakary N'tji Diallo <diallobakary4 at gmail.com>
a écrit :
> Thank you for your replies.
> About the large sample size, just for clarification, this is not a sample,
> these are all the scores.
> Should I do a random sampling?
> Other approach I tried was to normalize the data using the following
> approach:
> x = x - 2*x
> log_data = np.log(x) # to transform scores into positive value to apply
> the log function
> The log_data was also found to be not normally distributed.
>
> Le ven. 7 déc. 2018 à 22:05, <josef.pktd at gmail.com> a écrit :
>
>>
>>
>> On Fri, Dec 7, 2018 at 12:08 PM Paul Hobson <pmhobson at gmail.com> wrote:
>>
>>> I think you misunderstand the null hypothesis.
>>>
>>> The null hypothesis for this test is that the data are *not* normally
>>> distributed.
>>>
>>
>> That's not correct. The null hypothesis is the data come from a normal
>> distribution.
>>
>> My guess is that because of the relatively large sample size, the power
>> is quite large and the test detects relatively small deviation from
>> normality.
>>
>> len(x)
>> Out[8]: 1444
>>
>> stats.skewtest(x)
>> Out[9]: SkewtestResult(statistic=1.79241121722139,
>> pvalue=0.073067119279312559)
>>
>> stats.kurtosistest(x)
>> Out[10]: KurtosistestResult(statistic=3.5348152259352097,
>> pvalue=0.00040806039300234271)
>>
>> According the the two separate tests that are combined in the normal
>> test, the data has heavier tails, larger kurtosis than the normal
>> distribution.
>>
>> (Using kstest as distance measure, however, shows that the normal
>> distribution matches the data better than a t distribution with smaller df.
>> Note, pvalues for kstest don't apply because loc and scale are estimated.)
>>
>> Josef
>>
>>
>>
>>
>>>
>>> Since the p-value is your examples is 0.0003 (i.e., less than 0.001),
>>> you can reject the null hypothesis, suggesting that your data are normally
>>> distributed.
>>> -Paul
>>>
>>> On Fri, Dec 7, 2018 at 8:54 AM Bakary N'tji Diallo <
>>> diallobakary4 at gmail.com> wrote:
>>>
>>>> Dear all,
>>>> Hope you are doing very well.
>>>>
>>>> I am trying to apply a statistical normalization which require the
>>>> values to be normally distributed.
>>>> I have prepared a short notebook with all details.
>>>>
>>>> https://nbviewer.jupyter.org/github/diallobakary4/bioinformatics/blob/master/Normatily_test.ipynb
>>>>
>>>> It will be great if someone can help me out.
>>>>
>>>> Thanks
>>>> Best regards
>>>> --
>>>>
>>>> Bakary N’tji DIALLO
>>>>
>>>> PhD Student (Bioinformatics) <http://linkedin.com/in/bakarydiallo>, Research
>>>> Unit in Bioinformatics (RUBi) <https://rubi.ru.ac.za/>
>>>>
>>>> Mail: diallobakary4 at gmail.com | Skype: diallobakary4
>>>>
>>>> Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at python.org
>>>> https://mail.python.org/mailman/listinfo/scipy-user
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at python.org
>> https://mail.python.org/mailman/listinfo/scipy-user
>>
>
>
> --
>
> Bakary N’tji DIALLO
>
> PhD Student (Bioinformatics) <http://linkedin.com/in/bakarydiallo>, Research
> Unit in Bioinformatics (RUBi) <https://rubi.ru.ac.za/>
>
> Mail: diallobakary4 at gmail.com | Skype: diallobakary4
>
> Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14
>
>
--
Bakary N’tji DIALLO
PhD Student (Bioinformatics) <http://linkedin.com/in/bakarydiallo>, Research
Unit in Bioinformatics (RUBi) <https://rubi.ru.ac.za/>
Mail: diallobakary4 at gmail.com | Skype: diallobakary4
Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20181221/792e51c7/attachment.html>
More information about the SciPy-User
mailing list