[Mailman-Developers] spambayes integration
Simone Piunno
pioppo at ferrara.linux.it
Mon Apr 7 01:24:14 EDT 2003
Hi,
The last few days I've played with Barry's patch for spambayes integration
and after some little tweak (patches available on both Mailman and spambayes
SF bug trackers) it worked very well. Now I'm planning some enhancement:
- optionally, use a "continuos train" model, where the filter is
trained automatically for each incoming messaged categorized
as either ham or spam (unsure messages won't be used for
automatic training). In this case the "train on this message"
option in admindb will become "re-train on this message",
because we'll have to unlearn the previous train before
doing the new.
This is almost done.
- interface for training on leaked spam (messages that got categorized as
ham or unsure and therefore delivered to the list members). Currently
I have to log on the server and through the shell use some script to
load the spam message, because non-spam doesn't get held in admindb.
This is not acceptable.
What I'm thinking now, is that each message delivered to the list could
be saved somewhere in its pristine state (e.g. before CookHeaders,
probably in SpamDetect itself) so that at a later time I (the list admin)
could say "that was spam, please train on it", maybe refererring it by
Message-ID.
This buffer of pristine messages should be cleaned periodically
(number of days configurable?)
I thought also to different schemes, but they all have problems:
- forward the received message to listname-train at server with
the list password somewhere on the headers. Even if I use
MIME-forward to keep the message intact, it's not the same
message that was examinated by SpamDetect. We have a dozen
headers added or munged.
- upload through the web, same problems and we've also to
force the user to save in a commond format, e.g. unix mbox.
This would be a nightmare for windows users.
- stats where you can see how well the filter is performing, a
list of all token learnt with ham/spam counters and different
colors (green for ham indicators, red for spam indicators,
yellow for neutral ones).
This is probably related to the more general "Should be able to
gather statistics, such as deliveries/day, performance, number of
subscribers over time, etc." in the TODO page.
--
Simone Piunno -- http://members.ferrara.linux.it/pioppo
.------- Adde parvum parvo magnus acervus erit -------.
Ferrara Linux Users Group - http://www.ferrara.linux.it
Deep Space 6, IPv6 on Linux - http://www.deepspace6.net
GNU Mailman, Mailing List Manager - http://www.list.org
`-------------------------------------------------------'
More information about the Mailman-Developers
mailing list