Probabilistic unit tests?
duncan smith
buzzard at invalid.invalid
Sat Jan 12 19:08:33 CET 2013
On 12/01/13 08:07, alex23 wrote:
> On 11 Jan, 13:34, Steven D'Aprano <steve
> +comp.lang.pyt... at pearwood.info> wrote:
>> Well, that's not really a task for unit testing. Unit tests, like most
>> tests, are well suited to deterministic tests, but not really to
>> probabilistic testing. As far as I know, there aren't really any good
>> frameworks for probabilistic testing, so you're stuck with inventing your
>> own. (Possibly on top of unittest.)
>
> One approach I've had success with is providing a seed to the RNG, so
> that the random results are deterministic.
>
My ex-boss once instructed to do the same thing to test functions for
generating random variates. I used a statistical approach instead.
There are often several ways of generating data that follow a particular
distribution. If you use a given seed so that you get a deterministic
sequence of uniform random variates you will get deterministic outputs
for a specific implementation. But if you change the implementation the
tests are likely to fail. e.g. To generate a negative exponential
variate -ln(U)/lambda or -ln(1-U)/lambda will do the job correctly, but
tests for one implementation would fail with the other. So each time you
changed the implementation you'd need to change the tests.
I think my boss had in mind that I would write the code, seed the RNG,
call the function a few times, then use the generated values in the
test. That would not even have tested the original implementation. I
would have had a test that would only have tested whether the
implementation had changed. I would argue, worse than no test at all. If
I'd gone to the trouble of manually calculating the expected outputs so
that I got valid tests for the original implementation, then I would
have had a test that would effectively just serve as a reminder to go
through the whole manual calculation process again for any changed
implementation.
A reasonably general statistical approach is possible. Any hypothesis
about generated data that lends itself to statistical testing can be
used to generate a sequence of p-values (one for each set of generated
values) that can be checked (statistically) for uniformity. This
effectively tests the distribution of the test statistic, so is better
than simply testing whether tests on generated data pass, say, 95% of
the time (for a chosen 5% Type I error rate). Cheers.
Duncan
More information about the Python-list
mailing list