# RES: [Spambayes] RE: solution for the "spam of the future"?

Tiago Estill de Noronha TiagoTiago at Globo.com
Wed Dec 17 11:28:59 EST 2003

```Using the idea form kenny, I came with the following:
U would have a slider( or value or whatever on the non plugin ver) the would
set the weight of the new word metatoken
The slider would control how much points would be added for each 1% of new
word on the email

Or u could have spambayes to set the value for it self, learning as u traing
it, it would go like this:
It would get the average percentage of new words on  your ham mail, and the
>From that it would get the average of both values, and would interpolate the
percentage in so that from 0% to the treshold percentage the points to the
metatoken would be from 0 to .5, and the percentages from the average to
100% would go from .5 to 1

I think the formula would be something like this:
Code
==========
If msgnewwordpercent < averagepercent then
newwordsmetatokenpoints = .5 / hamnewordsaverage* msgnewwordpercent

Else
newwordsmetatokenpoints = ( .5 / (100 - averagepercent)* (100 -
msgnewwordpercent)+ .5
End if
====
End of the code

Sorry that it is in basic, it is the only programming language I know enough
to write something simple without consulting any books or help files or
tutorials
But I think it is easy to understand what it is meant to do

*********************
Tiago Estill de Noronha
TiagoTiago at Globo.com

-=> -----Mensagem original-----
-=> De: spambayes-bounces at python.org
-=> [mailto:spambayes-bounces at python.org] Em nome de Kenny Pitt
-=> Enviada em: terça-feira, 16 de dezembro de 2003 19:27
-=> Para: 'Coe, Bob'; spambayes at Python.org
-=> Assunto: RE: [Spambayes] RE: solution for the "spam of the future"?
-=>
-=>
-=> Coe, Bob wrote:
-=> > Don't start generating the "Missing: N" token until the
-=> database is
-=> > large enough for it to make sense.
-=>
-=> If this works at all, it also seems like the *percentage*
-=> of unknown word tokens in the message would work better
-=> than a log()'d count.  A very large newsletter is pretty
-=> much guaranteed to have a higher *count* of unknown tokens
-=> than a short mailing list message, but that's because it
-=> has more total tokens and not because it's any spammier.
-=>
-=> --
-=> Kenny Pitt
-=>
-=>
-=> _______________________________________________
-=> Spambayes at python.org
-=> -=> http://mail.python.org/mailman/listinfo/spambaye-=> s
-=> Check the
-=>
-=> http://spambayes.sf.net/faq.html
-=>
-=> ---
-=> Incoming mail is certified Virus Free.
-=> Checked by AVG anti-virus system (http://www.grisoft.com).
-=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003
-=>
-=>

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003

```