[Mailman-Developers] spambayes integration

Mon Apr 7 01:24:14 EDT 2003

Hi,

The last few days I've played with Barry's patch for spambayes integration 
and after some little tweak (patches available on both Mailman and spambayes 
SF bug trackers) it worked very well.  Now I'm planning some enhancement:

 - optionally, use a "continuos train" model, where the filter is 
   trained automatically for each incoming messaged categorized
   as either ham or spam (unsure messages won't be used for 
   automatic training).  In this case the "train on this message"
   option in admindb will become "re-train on this message", 
   because we'll have to unlearn the previous train before 
   doing the new.
   This is almost done.

 - interface for training on leaked spam (messages that got categorized as 
   ham or unsure and therefore delivered to the list members).  Currently
   I have to log on the server and through the shell use some script to 
   load the spam message, because non-spam doesn't get held in admindb.  
   This is not acceptable.
   What I'm thinking now, is that each message delivered to the list could 
   be saved somewhere in its pristine state (e.g. before CookHeaders, 
   probably in SpamDetect itself) so that at a later time I (the list admin) 
   could say "that was spam, please train on it", maybe refererring it by 
   Message-ID.
   This buffer of pristine messages should be cleaned periodically
   (number of days configurable?)
   I thought also to different schemes, but they all have problems:
     - forward the received message to listname-train at server with
       the list password somewhere on the headers.  Even if I use
       MIME-forward to keep the message intact, it's not the same 
       message that was examinated by SpamDetect.  We have a dozen 
       headers added or munged.
     - upload through the web, same problems and we've also to 
       force the user to save in a commond format, e.g. unix mbox.
       This would be a nightmare for windows users.

 - stats where you can see how well the filter is performing, a 
   list of all token learnt with ham/spam counters and different
   colors (green for ham indicators, red for spam indicators, 
   yellow for neutral ones).
   This is probably related to the more general "Should be able to
   gather statistics, such as deliveries/day, performance, number of
   subscribers over time, etc." in the TODO page.

-- 
 Simone Piunno -- http://members.ferrara.linux.it/pioppo 
.-------  Adde parvum parvo magnus acervus erit  -------.
 Ferrara Linux Users Group - http://www.ferrara.linux.it 
 Deep Space 6, IPv6 on Linux - http://www.deepspace6.net 
 GNU Mailman, Mailing List Manager - http://www.list.org 
`-------------------------------------------------------'