[Spambayes] Spam Clues ????? ??????

Tue Apr 29 13:18:52 CEST 2008

On Mon, April 28, 2008 21:46, alan falk wrote:
> How about an option IN SpamBayes to do this automatically?
> Choose the number of iterations?
>
> Cheers!
> Alan Falk
> Raleigh, NC
> +af
> www.plusaf.com
> =========================================
> Email protected, incoming and outgoing by F-Prot Anti-Virus
> =========================================

How would you automate this?
My procedure relies on user input: the user has to decide what is spam and
what not. Can you automate user input? That would defeat the usage of
SpamBayes in the first place...

It's not much work. 15' the first time, and max. 5'/week for additional
training. And it's fun, because you can actually *see* SpamBayes
improving. I wouldn't automate this for all the money in the world (you
can give me some anyway ;-) )

>
>
>
>
> -----Original Message-----
> From: spambayes-bounces+plusaf=plusaf.com at python.org
> [mailto:spambayes-bounces+plusaf=plusaf.com at python.org] On Behalf Of
> Amedee
> Van Gasse
> Sent: Monday, April 28, 2008 8:33 AM
> To: spambayes at python.org
> Subject: Re: [Spambayes] Spam Clues ????? ??????
>
>
> On Wed, April 16, 2008 19:34, David wrote:
>>
>> Am getting loads of spam with cyrillic characters and would like to know
>> if
>> Spambayes can automatically delete anything with these characters in
>> their
>> headers. Below is score info for typical one. If you need it,  could
>> send
>> you the config file if you can tell me where to find it.
>>
>> Kindest regards
>> David Kanareck
>>
>>
>>
>>
>>
>> Combined Score: 57% (0.567348)
>>
>> Internal ham score (*H*): 0.285187
>> Internal spam score (*S*): 0.419882
>>
>
>
>> # ham trained on: 39
>> # spam trained on: 76
>
> That is not much training. In my experience, Spambayes gets *extremely*
> accurate after about 100 hams and 100 spams. Your mileage may vary.
> With the Outlook plugin, I add a column that shows the spam score (see
> FAQ/wiki for details). I sort on spam score. I look at the bottom and find
> one spam with the lowest score. Train as spam. Rescore inbox. Now I look
> at the top, and find one ham with the highest score. Train as ham,
> rescore. Back to the lowest spam, rescore. Highest ham, rescore. Lather,
> rince, repeat. Very quickly you will see that all spam scores above 99%
> and all ham scores below 1%.
>
> This method of training is so kewl that I have actually considered
> installing Outlook on Linux, just so that I could train Spambayes this
> way.
>
>> 'message.'                          0.310872           15     13
>>
>> 'date:'                             0.325631           14     13
>>
>> 'checked'                           0.341867           13     13
>>
>> 'database:'                         0.341867           13     13
>>
>> 'incoming'                          0.341867           13     13
>>
>> 'version:'                          0.341867           13     13
>>
>> 'virus'                             0.35698            14     15
>>
>> 'release'                           0.358294           13     14
>>
>> 'avg.'                              0.359817           12     13
>>
>> 'skip:2 10'                         0.359817           12     13
>>
>> 'found'                             0.385564           14     17
>
> These are generic tokens added by your virus scanner. After more training
> they will score around .5 which means they will neither increase nor
> decrease the global spam score of a message.
>
>> 'to:no real name:2**0'              0.750084           10     59
>>
>> 'header:Received:1'                 0.893006            1     18
>
> Interesting tokens...
>
>> 'from:charset:koi8-r'               0.908163            0      2
>>
>> 'subjectcharset:koi8-r'             0.908163            0      2
>
> And those last two are *really* interesting tokens!
> Keep on training, I can already see that your Spambayes is improving.
>
>
> --
> Amedee Van Gasse
> amedee at amedee.be
>
> Disclaimer:
> By sending an email to ANY of my addresses you are agreeing that:
>
>    1. I am by definition, "the intended recipient"
>    2. All information in the email is mine to do with as I see fit and
> make such financial profit, political mileage, or good joke as it lends
> itself to. In particular, I may quote it on usenet.
>    3. I may take the contents as representing the views of your company.
>    4. This overrides any disclaimer or statement of confidentiality that
> may be included on your message.
>
> _______________________________________________
> SpamBayes at python.org
> http://mail.python.org/mailman/listinfo/spambayes
> Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
> Check the FAQ before asking: http://spambayes.sf.net/faq.html
>
>

-- 
Amedee Van Gasse
amedee at amedee.be

Disclaimer:
By sending an email to ANY of my addresses you are agreeing that:

   1. I am by definition, "the intended recipient"
   2. All information in the email is mine to do with as I see fit and
make such financial profit, political mileage, or good joke as it lends
itself to. In particular, I may quote it on usenet.
   3. I may take the contents as representing the views of your company.
   4. This overrides any disclaimer or statement of confidentiality that
may be included on your message.