RES: [Spambayes] RE: solution for the "spam of the future"?

Tiago Estill de Noronha TiagoTiago at Globo.com
Wed Dec 17 11:28:59 EST 2003


Using the idea form kenny, I came with the following:
U would have a slider( or value or whatever on the non plugin ver) the would
set the weight of the new word metatoken
The slider would control how much points would be added for each 1% of new
word on the email


Or u could have spambayes to set the value for it self, learning as u traing
it, it would go like this:
It would get the average percentage of new words on  your ham mail, and the
average on your spam mail,
>From that it would get the average of both values, and would interpolate the
percentage in so that from 0% to the treshold percentage the points to the
metatoken would be from 0 to .5, and the percentages from the average to
100% would go from .5 to 1

I think the formula would be something like this:
Code
==========
If msgnewwordpercent < averagepercent then
 newwordsmetatokenpoints = .5 / hamnewordsaverage* msgnewwordpercent

Else
 newwordsmetatokenpoints = ( .5 / (100 - averagepercent)* (100 -
msgnewwordpercent)+ .5
End if
====
End of the code

Sorry that it is in basic, it is the only programming language I know enough
to write something simple without consulting any books or help files or
tutorials
But I think it is easy to understand what it is meant to do

 
 
*********************
Tiago Estill de Noronha
TiagoTiago at Globo.com


-=> -----Mensagem original-----
-=> De: spambayes-bounces at python.org 
-=> [mailto:spambayes-bounces at python.org] Em nome de Kenny Pitt
-=> Enviada em: terça-feira, 16 de dezembro de 2003 19:27
-=> Para: 'Coe, Bob'; spambayes at Python.org
-=> Assunto: RE: [Spambayes] RE: solution for the "spam of the future"?
-=> 
-=> 
-=> Coe, Bob wrote:
-=> > Don't start generating the "Missing: N" token until the 
-=> database is 
-=> > large enough for it to make sense.
-=> 
-=> If this works at all, it also seems like the *percentage* 
-=> of unknown word tokens in the message would work better 
-=> than a log()'d count.  A very large newsletter is pretty 
-=> much guaranteed to have a higher *count* of unknown tokens 
-=> than a short mailing list message, but that's because it 
-=> has more total tokens and not because it's any spammier.
-=> 
-=> -- 
-=> Kenny Pitt
-=> 
-=> 
-=> _______________________________________________
-=> Spambayes at python.org 
-=> -=> http://mail.python.org/mailman/listinfo/spambaye-=> s
-=> Check the 
-=> 
-=> FAQ before asking: 
-=> http://spambayes.sf.net/faq.html
-=> 
-=> ---
-=> Incoming mail is certified Virus Free.
-=> Checked by AVG anti-virus system (http://www.grisoft.com).
-=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003
-=>  
-=> 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003
 




More information about the Spambayes mailing list