[Spambayes] Re: Collecting word lists.. - BUMMER

Meyer, Tony T.A.Meyer at massey.ac.nz
Mon May 26 12:28:08 EDT 2003


> And now, here's a collection of "test" data, which I assume 
> came from the same test 
> dataset, but seems strangely low common word count too.

They came from two datasets, although one was much bigger than the
other.  As I said indicated (in the offlist mail I sent), I realised
after I did this that I should have only chosen one from each dataset,
since there should be a pretty big overlap here.

> Loaded 10 wordlists with 232531 distinct words out of 235138 

Given that half of these were from the same dataset, there should be a
*lot* less distinct words than this.

I can't say what the problem is, but the result is definitely not
correct.

=Tony Meyer



More information about the Spambayes mailing list