[Spambayes] RE Spam

Amedee Van Gasse amedee at amedee.be
Wed May 24 11:21:23 CEST 2006


On Tue, May 23, 2006 19:04, skip at pobox.com said:
>
>     Amedee> I have noticed that a lot of spam contains disclaimer-ish
> text.
>     Amedee> If I train spambayes with "disclaimed" ham, I fear this will
>     Amedee> "pollute" the sb database.  The result might be that any email
>     Amedee> with a disclaimer-ish text will get a relatively high ham
> score.
>     Amedee> At the moment, I don't see a solution for this possible
> problem.
>     Amedee> I *could* not train on disclaimed ham, but if most of my
>     Amedee> correspondents have such boilerplates, training spambayes
> won't
>     Amedee> be very efficient.
>
> That depends.  Most common English words (most of the words in disclaimers
> are probably pretty common) should probably score around 0.5 and thus not
> be
> used in ranking messages, e.g.:

Interesting.
However, English is not my mother language and most of my correspondence
is in Dutch.
As a consequence, most common English words are quite uncommon for me. The
result is that common English words will score a bit above 0.5. Perhaps
not much, but enough to be significant after a while.

-- 
Disclaimer:
By sending an email to ANY of my addresses you are agreeing that:

   1. I am by definition, "the intended recipient"
   2. All information in the email is mine to do with as I see fit and
make such financial profit, political mileage, or good joke as it lends
itself to. In particular, I may quote it on usenet.
   3. I may take the contents as representing the views of your company.
   4. This overrides any disclaimer or statement of confidentiality that
may be included on your message.



More information about the SpamBayes mailing list