[Spambayes] Where are we heading?
T. Alexander Popiel
popiel@wolfskeep.com
Fri Oct 25 22:58:35 2002
It seems like all the work for the last week or so has been
on integration of the classifier with end-user deployments
(clients, mailing list filters, whathaveyou). Have we reached
the point where we're no longer interested in this as a research
project, but instead as a useful tool?
If so, I suggest that we may want to rewrite the whole thing
from scratch, after actually deciding on a usage model or two.
Choosing the algorithms to use (gary-combining or chi-square?)
would be good, too. What we've got now is a decent prototype,
but it lacks quite a bit as a finished tool... there are a lot
of issues with database storage (what should be in it, how it
should be stored, etc.) and options management, just to name
two of the hotspots.
Personally, I'm still interested in the research aspects;
once I get another two free hours to rub together, I'm going
to see if I can deal with some of the mail decoding issues in
the tokenizer (the unencoded mailing-list footer appended to
a base64 body, to be specific). There's also a few experiments
I'd like to see revisited: the time of delivery stuff might be
interesting to test on multiple corpora, as an example (since
my spam does not seem to be evenly spread throughout the day,
unlike the original experimenter's spam).
- Alex