[Spambayes] Re: Training oddity/confusion

Tony Meyer tameyer at ihug.co.nz
Mon Jan 17 04:15:23 CET 2005

[Tony Meyer]
> I'm trying to get things rearranged a little for 1.1 so 
> that it's easier to try out different training regimes
> (including tte) with the various apps, so hopefully that'll
> help.

[Mathew Hendry]
> Ah, that sounds good. You mean by making it easier to tweak 
> the training code, or by exposing more options in the user interface?

Easier to tweak the code.  I'd like to have a central place to hook into
training.  I've got some code that does this, but it still needs more work
before I check it in.  If I manage to get it done, I'll include algorithms
for train-on-everything, train-on-nothing, nonedge, and tte.  I'd like to
have it reasonably easy for someone to add a new style, and reasonably easy
for people to switch between them without having to alter their use

> I don't think I'd need to go that far. Most of the unsure 
> spam I get is in the 70-90% range.

I'd imagine that 70% would be safe enough as a spam cutoff.

> BTW, I'm also seeing better results since finally re-enabling 
> my SpamCopAndAssassin patch and retraining (was running 
> vanilla 1.0.1 before). The URL blacklist support 
> (http://surbl.org) recently added to SpamAssassin seems to 
> make for particularly good spam clues; from the most recent 
> spam to come in:
> 'sa_rule:3.0:DRUGS_ERECTILE'        0.84212             4     23

Hmmm.  I won't ask about the four ham with that clue <wink>.

> Some spammers have now resorted to removing explicit links 
> from their spam and asking recipients to cut and paste an 
> address into their browser, apparently to avoid their URLs 
> automatically being picked up and added to these blacklists.

You mean it's just http://spambayes.org instead of <a
href="http://spambayes.org">my site</a>?  We'd still generate the
appropriate URL tokens for that, I believe.  (And Outlook still considers
the first a link).  Or is this something more complex?


