[Tutor] Statistics with python

Sat Oct 13 15:22:20 EDT 2018

On Sat, 13 Oct 2018 at 11:23, Mariam Haji <mariamhaji01 at gmail.com> wrote:
>
> Hi guys,

Hi Mariam

> the question is as:
> If a sample of 50 patients is taken from a dataset what is the probability
> that we will get a patient above the age of 56?

I can think of several ways of interpreting this:

(a): You have a dataset consisting of 50 patients. You want to know
the probability that a patient chosen from that sample will be above
the age of 56.

(b): You have a dataset consisting of 50 patients. You consider it to
be representative of a larger population of people. You would like to
use your dataset to estimate the probability that a patient chosen
from the larger population will be above the age of 56.

(c): You have a larger dataset consisting of more than 50 patients.
You want to know that probability that a sample of 50 patients chosen
from the larger dataset will contain at least (or exactly?) one person
above the age of 56.

(d): You have a larger dataset, but you will only analyse a sample of
50 patients from it. You want to use statistics on that sample to
estimate the probability that a patient chosen from the larger dataset
will be above the age of 56.

I can list more interpretations but I think it would be better to wait
for you to clarify.

> So I know my sample mean is 50

Do you mean that you separately know that the sample mean is 50? Or do
you mean that you know it's 50 because of what you stated above?

Above you stated that you have a sample *size* of 50 and that doesn't
imply that the sample *mean* is 50.

> and my no is 56
> to get std I manually did 50/√56 (that's 50/square root of 56) I got the
> answer as 6.66

It's possible that you are not using the correct terminology here but
otherwise this isn't correct. If you had a sample *standard deviation*
of 50 and a sample *size* of 50 then 50/sqrt(56) would give you the
standard error. I am not sure that you did actually want to do that
though.

> So my mean is 50 and std is 6.66

I'm not sure that this is correct...

> Then I did the below to get the z score and probability using scipy.stats
> as st
>
> m=50
> s=6.66
>
> z1 = (56-m)/s
>
> p1 = st.norm.sf(z1)
> print ('Probability of patient above the age of 56 is:', (p1))
>
> Probability of patient above the age of 56 is: 0.183820506093897

The part above looks correct if we assume we are choosing a single
patient whose age is normally distributed with mean 50 and standard
deviation 6.66. I'm not sure these assumptions are correct though.

--
Oscar