[Spambayes] Re: Collecting word lists.. - BUMMER
Meyer, Tony
T.A.Meyer at massey.ac.nz
Mon May 26 12:28:08 EDT 2003
> And now, here's a collection of "test" data, which I assume
> came from the same test
> dataset, but seems strangely low common word count too.
They came from two datasets, although one was much bigger than the
other. As I said indicated (in the offlist mail I sent), I realised
after I did this that I should have only chosen one from each dataset,
since there should be a pretty big overlap here.
> Loaded 10 wordlists with 232531 distinct words out of 235138
Given that half of these were from the same dataset, there should be a
*lot* less distinct words than this.
I can't say what the problem is, but the result is definitely not
correct.
=Tony Meyer
More information about the Spambayes
mailing list