[Spambayes] Re: Another software in the field
T. Alexander Popiel
Fri Nov 15 16:50:10 2002
In message: <firstname.lastname@example.org>
Michael Hudson <email@example.com> writes:
>Matt Sergeant <firstname.lastname@example.org> writes:
>> Anthony Baxter said the following on 15/11/02 11:01:
>> > One thing that suprises me is that there's a seemingly endless list of
>> > projects all implementing Graham's approach exactly as he originally
>> > described it - almost no-one else is doing the basic testing and
>> > research that this sort of approach would seem to cry out for.
>> Why would anyone else want to, when you guys are doing such an amazing
>> job of it? ;-)
>But isn't that the point? The people are implementing Graham's
>original algorithm, not the soupa-doupa wizzy one that lives in
None of us have written a nice, concise, and easily understood
_English_ description of the algorithm we're actually using.
Further, that nonexistant concise description hasn't been
slashdotted. Until that happens, spambayes will be a footnote.
Personally, I think the classifier algorithm is mature enough
now to support such a description. (Yes, I know Gary has written
essays on part of it, but they're not comprehensive, and I don't
think anything has been written about our use of chi-square.)
The classifier _implementation_ is still fluctuating a fair
amount as people consider splitting the database, removing
access counts, etc.
The tokenizer algorithm we've got is both not mature enough to
support such a description, and also not amenable to concise
description. On the other hand, I suspect that in a screed
about the classifier, we could get away with a very brief
description of the basics (words counted once per message,
split-on-whitespace, ignore most headers, etc.) and leave the
fine tuning as a reference to the tokenizer implementation.
Unfortunately, I'm not a good technical writer. If I were,
I'd try my hand at writing up a description of the classifier,
and then post it somewhere with a lot of bandwidth. As it is,
I can only whine about it not existing.
More information about the Spambayes