None of us have written a nice, concise, and easily understood
_English_ description of the algorithm we're actually using.
Further, that nonexistant concise description hasn't been
slashdotted.  Until that happens, spambayes will be a footnote.

Personally, I think the classifier algorithm is mature enough
now to support such a description.  (Yes, I know Gary has written
essays on part of it, but they're not comprehensive, and I don't
think anything has been written about our use of chi-square.)
The classifier _implementation_ is still fluctuating a fair
amount as people consider splitting the database, removing
access counts, etc.

The tokenizer algorithm we've got is both not mature enough to
support such a description, and also not amenable to concise
description.  On the other hand, I suspect that in a screed
about the classifier, we could get away with a very brief
description of the basics (words counted once per message,
split-on-whitespace, ignore most headers, etc.) and leave the
fine tuning as a reference to the tokenizer implementation.

Unfortunately, I'm not a good technical writer.  If I were,
I'd try my hand at writing up a description of the classifier,
and then post it somewhere with a lot of bandwidth.  As it is,
I can only whine about it not existing.

