[Spambayes] bayesian research

Anthony Baxter anthony at interlink.com.au
Mon Dec 30 18:28:26 EST 2002


>>> "alan" wrote
> Hi,
> I have been given the task of reseaching Bayesian mail filters for my 
> final year Univeristy dissertaion.
> I have been finding brick walls at every turn.
> I know paul graham is great, but just about every one talk about his plan for
> spam, but i need a start place.
> I have set my system to allow relays and have 1000's of spam examples.
> Any ideas where i should start?

Look at the 'background' page on our website, for starters. Note that 
you don't just need a collection of spam - you also want some of the 
"real email" (we call it 'ham') that went with the spam. You can start
with differently sourced ham and spam, but you've then got a problem
with false clues (e.g. different header 'Received' lines from the 
different mail systems).

For further info, download the code and read the source - it's heavily
commented, and there's a whooole pile of nice information in there.

It's probably also worth noting that this project has pretty much
tossed out the Graham algorithm. 

Anthony
-- 
Anthony Baxter     <anthony at interlink.com.au>   
It's never too late to have a happy childhood.




More information about the Spambayes mailing list