[Spambayes] SpamBayes and TREC

Tony Meyer tameyer at ihug.co.nz
Sun Nov 20 00:25:53 CET 2005


> How did SpamBayes perform in the TREC 2005 testing?  Do you have any
> numbers?

For a start, you can see the information here: http:// 
plg.uwaterloo.ca/~gvcormac/trecspamtrack05/

At some point during the registration process TREC latched on to  
"Massey University" (where I was working at the time, but completely  
uninvolved with SpamBayes) as my 'organisation name', so you may see  
that in some of the results.  Just substitute "SpamBayes" for "Massey  
University" wherever you see it.

I'll make my notebook paper available when I have a chance, and (once  
it's done) my proceedings paper.

In brief, SpamBayes did better than I expected (towards the bottom of  
the top ten) considering that it is designed to classify as ham/ 
unsure/spam, not ham/spam, and considering that I didn't make any  
special effort to change options, etc (in fact, it seems that the  
best variant of SpamBayes was the out-of-the-box one), nor did I put  
any effort into determining what the single cutoff should be.

What surprised me the most was that the train-on-everything variant  
seems to have performed the best.  I'm still looking into this; I  
hope to have more details by the time the proceedings paper is finished.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.




More information about the SpamBayes mailing list