[SciPy-User] [SciPy-user] Unit testing of Bayesian estimator

Sat Nov 7 10:09:27 EST 2009

Hi Anne, interesting question.

I'm not really sure what a Bayesian hypothesis tester is, but I expect that
it results in a random variable.  For a given input prior and measurement
distribution, and choice of hypothesis (signal present or signal absent),
can you know the distribution of this random variable?  If so, it could come
down to a test that this random variable - or a function of it such as mean
or probability that it is greater than some value - behaves as expected. 
How do you create a unit-test that a random variable generator is working? 
If the random variables were all iid normal, you could average a bunch and
then test that the mean of the sample was close to the mean of the
distribution - which is going to be impossible to guarantee, since there is
a non-zero probability that the mean is large.  In practice however it is
likely that your test will pass, but you will probably have to tack down the
seeds and make sure that the probability of failure is really small so
changing seeds (and hence the underlying sequence of "random" inputs) won't
likely cause a failure.

Is there anything in scipy's stats module to test that a series of random
variables has a given distribution?  Maybe scipy.stats.kstest?  Who the heck
are Kolmogorov and Smirnov anyway :-)?

Anne Archibald-2 wrote:
> 
> Hi,
> 
> I have implemented a simple Bayesian regression program (it takes
> events modulo one and returns a posterior probability that the data is
> phase-invariant plus a posterior distribution for two parameters
> (modulation fraction and phase) in case there is modulation). I'm
> rather new at this, so I'd like to construct some unit tests. Does
> anyone have any suggestions on how to go about this?
> 
> For a frequentist periodicity detector, the return value is a
> probability that, given the null hypothesis is true, the statistic
> would be this extreme. So I can construct a strong unit test by
> generating a collection of data sets given the null hypothesis,
> evaluating the statistic, and seeing whether the number that claim to
> be significant at a 5% level is really 5%. (In fact I can use the
> binomial distribution to get limits on the number of false positive.)
> This gives me a unit test that is completely orthogonal to my
> implementation, and that passes if and only if the code works. For a
> Bayesian hypothesis testing setup, I don't really see how to do
> something analogous.
> 
> I can generate non-modulated data sets and confirm that my code
> returns a high probability that the data is not modulated, but how
> high should I expect the probability to be? I can generate data sets
> with models with known parameters and check that the best-fit
> parameters are close to the known parameters - but how close? Even if
> I do it many times, is the posterior mean unbiased? What about the
> posterior mode or median? I can even generate models and then data
> sets that are drawn from the prior distribution, but what should I
> expect from the code output on such a data set? I feel sure there's
> some test that verifies a statistical property of Bayesian
> estimators/hypothesis testers, but I cant quite put my finger on it.
> 
> Suggestions welcome.
> 
> Thanks,
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> 

-- 
View this message in context: http://old.nabble.com/Unit-testing-of-Bayesian-estimator-tp26240654p26241135.html
Sent from the Scipy-User mailing list archive at Nabble.com.