Voting Project Needs Python People

Mon Jul 28 19:13:56 EDT 2003

"Alan Dechert" <adechert at earthlink.net> wrote in message news:<TEUUa.23927$Mc.1868971 at newsread1.prod.itd.earthlink.net>...
> Thanks, Ian.  I could not quite figure out if your binomial distribution
> calculator would be applicable.
> 
> Here's what Bill Buck ( billbuck at cox.net ) sent me.  It turns out there is a
> BINOMDIST function in Excel.  I think it might be what I want.
> 
> http://home.earthlink.net/~adechert/LOTAcceptanceCalculator.xls
> 
> Let me try to clarify what I'm after.  The paper record should always match
> the electronic record.  So the allowable defects is zero.  If there is a
> mismatch found in the sample, we don't publish the electronic tally: we take
> all the paper ballots and check them.
> 
> We are talking about an election conducted with computer-generated paper
> ballots.  The paper ballots represent the actual vote since these ballots
> are what voters actually saw, verified, and cast.  We will have an
> electronic record obtained from the computers which should match each paper
> ballot generated.  We want to use the electronic record since it will give
> us an instant result -- but we have to check it against the paper ballots to
> be sure the election result is correct.  So, in this scenario, the
> electronic count is a prediction (or preliminary tally).
> 
> So, if by the preliminary electronic tally a candidate won a race by 1
> percent, I want to know how many ballots we have to check (random sample) to
> be certain that the result predicted is true.
> 
> When I put one million into this Confidence Level Calculator, and Acceptable
> Quality Level of .01, a sample of 10,000 shows a confidence level of "1."  A
> sample of 2000 give C.L of 0.999999998  Presumably, "1" is really
> .9999999999+ more 9s.  Can we get more decimal places?
> 
> So I guess the Lot Fraction Defective is analgous to the predicted victory
> margin.  Is that right?
> 
> I would still like a standalone calculator that doesn't require Excel.
> 
> Alan Dechert

The BINOMDIST function in Excel calculates either the cumulative
probability function or the probability mass function for the binomial
distribution. BINOMDIST is usually accurate when it works but is known
to fail with large numbers of trials, small event probabilities etc.
The functions used by the calculators in
http://members.aol.com/iandjmsmith/EXAMPLES.HTM are somewhat more
robust and more accurate than BINOMDIST.

I gather what you want is to choose a sample size based on how may
mismatches there would have to be to change the result. If any
mismatches are found then you take all the ballot papers and check
them. This would mean that the closer the result was, the bigger the
sample size would have to be. In the case where the electronic count
is a dead heat all ballot papers would have to be checked!

If so then the spreadsheet at
http://home.earthlink.net/~adechert/LOTAcceptanceCalculator.xls is
doing roughly what you want. There appears to be a slight error in
there where it uses $C$4 instead of $D$4 in cells C15 to V42, although
it is irrelevant to you since the value in $C$4 is 0 anyway. If it
used =BINOMDIST() instead of =1-BINOMDIST() in cells C15 to V42 then
you would be able to see how small the probability of not seeing a
mismatch is (i.e. you don't really need more decimal places). What I'm
not sure of is whether "The Lot Fraction Defective" is analagous to
the predicted victory margin or to twice the predicted victory margin.
This would depend on whether mismatches were ignored or counted the
other way round.

The only real problem is that since the sample size could be large
relative to the population size, the binomial distribution may not
provide an adequate approximation to the Hypergeometric distribution
(I assume you will be using sampling without replacement - the
binomial is strictly only correct for sampling with replacement).

Using the Hypergeometric calculator, if we have 1 million ballot
papers and 100 mismatches could cause the result to be changed then
the probability of observing 0 mismatches will depend on the sample
size as follows

sample size	probability of failing to detect a mismatch when it might
matter
100		0.99
1000		0.90
10000		0.37
1000000		2.65e-5
2000000		2.03e-10

If we have 1 million ballot papers and 10000 mismatches could cause
the result to be changed then the probability of observing 0
mismatches will depend on the sample size as follows

sample size	probability of failing to detect a mismatch when it might
matter
100		0.37
1000		4.30e-5
10000		1.35e-44

While the calculators at
http://members.aol.com/iandjmsmith/EXAMPLES.HTM are useful in their
own right, they are really supposed to be a demonstration of how to
use the functions in myfunctions.js so people can then use the
functions in their own software! In this case what we really want is
to choose the sample size required given a total number of ballots,
the critical number of mismatches and the probability that we do not
detect a mismatch. The calculations are fairly simple and you can
build your own calculator for this quickly and easily (see
http://members.aol.com/iandjmsmith/reqss.htm).

With this calculator, we can see that if we have a million ballot
papers and the probability of failure to detect a mismatch is 1e-6,
then the minimum sample size is related to the critical number of
mismatches as follows:-

Critical 
number of 
mismatches	Sample size required
1		999999
10		748808
100		129031
1000		13714
10000		1374
100000		132
200000		62
300000		39
500000		20
900000		6

Ian Smith