[Spambayes] Re: Another software in the field

T. Alexander Popiel popiel@wolfskeep.com
Fri Nov 15 16:50:10 2002

In message:  <2mheejgiqe.fsf@starship.python.net>
             Michael Hudson <mwh@python.net> writes:
>Matt Sergeant <msergeant@startechgroup.co.uk> writes:
>> Anthony Baxter said the following on 15/11/02 11:01:
>> > 
>> > One thing that suprises me is that there's a seemingly endless list of 
>> > projects all implementing Graham's approach exactly as he originally
>> > described it - almost no-one else is doing the basic testing and
>> > research that this sort of approach would seem to cry out for.
>> Why would anyone else want to, when you guys are doing such an amazing 
>> job of it? ;-)
>But isn't that the point?  The people are implementing Graham's
>original algorithm, not the soupa-doupa wizzy one that lives in

None of us have written a nice, concise, and easily understood
_English_ description of the algorithm we're actually using.
Further, that nonexistant concise description hasn't been
slashdotted.  Until that happens, spambayes will be a footnote.

Personally, I think the classifier algorithm is mature enough
now to support such a description.  (Yes, I know Gary has written
essays on part of it, but they're not comprehensive, and I don't
think anything has been written about our use of chi-square.)
The classifier _implementation_ is still fluctuating a fair
amount as people consider splitting the database, removing
access counts, etc.

The tokenizer algorithm we've got is both not mature enough to
support such a description, and also not amenable to concise
description.  On the other hand, I suspect that in a screed
about the classifier, we could get away with a very brief
description of the basics (words counted once per message,
split-on-whitespace, ignore most headers, etc.) and leave the
fine tuning as a reference to the tokenizer implementation.

Unfortunately, I'm not a good technical writer.  If I were,
I'd try my hand at writing up a description of the classifier,
and then post it somewhere with a lot of bandwidth.  As it is,
I can only whine about it not existing.

- Alex

More information about the Spambayes mailing list