2 sample chi-square test
Peter Pearson
pkpearson at nowhere.invalid
Tue Dec 29 11:47:45 EST 2020
On Tue, 29 Dec 2020 02:52:15 -0800 (PST), Priya Singh wrote:
[snip]
> I have two spectra with wavelength, flux, and error on flux. I want to
> find out the variability of these two spectra based on the 2 sample
> Chi-square test. I am using following code:
>
> def compute_chi2_var(file1,file2,zemi,vmin,vmax):
> w1,f1,e1,c1,vel1 = get_spec_vel(dir_data+file1,zemi)
> id1 = np.where(np.logical_and(vel1 >= vmin, vel1 < vmax))[0]
> w2,f2,e2,c2,vel2 = get_spec_vel(dir_data+file2,zemi)
> id2 = np.where(np.logical_and(vel2 >= vmin, vel2 < vmax))[0]
> f_int = interp1d(w1[id1], f1[id1]/c1[id1], kind='cubic')
> e_int = interp1d(w1[id1], e1[id1]/c1[id1], kind='cubic')
> f_obs,e_obs = f_int(w2[id2]), e_int(w2[id2])
> f_exp, e_exp = f2[id2]/c2[id2], e2[id2]/c2[id2]
> e_net = e_obs**2 + e_exp**2
> chi_square = np.sum( (f_obs**2 - f_exp**2)/e_net )
> dof = len(f_obs) - 1
> pval = 1 - stats.chi2.cdf( chi_square, dof)
> print('%.10E' % pval)
>
> NN = 320
> compute_chi2_var(file7[NN],file14[NN],zemi[NN],vmin[NN],vmax[NN])
>
>
> I am running this code on many files, and I want to grab those pair of
> spectra where, the p-value of chi-squa is less than 10^(-8), for the
> change to be unlikely due to a random occurrence.
>
> Is my code right concept-wise? Because the chi-squ value is coming out
> to be very large (positive and negative), such that my p-value is
> always between 1 and 0 which I know from other's results not correct.
>
> Can anyone suggest me is the concept of 2-sample chi-squ applied by me
> is correct or not?
1. This is not really a Python question, is it?
2. Recommendation: test your chi-squared code on simpler sample data.
3. Observation: P-values *are* normally between 0 and 1.
4. Observation: chi-squared values are never negative.
5. Recommendation: Learn a little about the chi-squared distribution
(but not on a Python newsgroup). The chi-squared distribution with
N degrees of freedom is the distribution expected for a quantity
that is the sum of the squares of N normally distributed random
variables with mean 0 and standard deviation 1. If you expect
f_obs to equal f_exp plus some normally distributed noise with
mean 0 and standard deviation sigma, then (f_obs-f_exp)/sigma
should be normally distributed with mean 0 and standard deviation 1.
6. Observation: (f_obs**2 - f_exp**2)/e_net is probably not what
you want, since it can be negative. You probably want something
like (f_obs-f_exp)**2/e_net. But don't take my word for it.
Good luck.
--
To email me, substitute nowhere->runbox, invalid->com.
More information about the Python-list
mailing list