[Spambayes] RE: [spambayes-dev] FW: Spam Clues: Do you remember me ?

Tony Meyer tameyer at ihug.co.nz
Wed Jan 14 19:20:22 EST 2004


> Here's an interesting approach, that flew right under my 
> spambayes, despite a 5 MB db.

It is interesting, although there are some issues here:

> (And just to be clear, few or none of the words in my database are in
> German.)

But what about these?
> 'alle'                              0.0505618           4      0
> 'auf'                               0.155172            1      0
> 'das'                               0.155172            1      0
> 'hier'                              0.155172            1      0
> 'der'                               0.192406            5      2
> 'und'                               0.278497            3      2
> 'sie'                               0.374974            1      1
> 'haben'                             0.765605            0      1
> 'ihr'                               0.765605            0      1
> 'ihren'                             0.765605            0      1

Overall, the German looks more hammy than spammy.

> # ham trained on: 1491
> # spam trained on: 2924

Hmm.  This isn't that imbalanced, but it would be interesting to know what
the score would have been with only 1491 of the spam.  You don't have the
(now dead) ham_spam_imbalanced_adjustment on, do you?  (which version of the
Outlook plugin are you using?)

It also looks like you get (well, train) quite a bit of mail about
music/music tools.  Since the spam was mostly about music/music tools, it
was hammy - that's just the way things go with this sort of classifier.

=Tony Meyer




More information about the Spambayes mailing list