[Spambayes] Web interface statistics

skip at pobox.com skip at pobox.com
Thu May 10 14:17:39 CEST 2007


    Dave> There really is something very fishy going on.  I actually added
    Dave> instrumentation code to watch my training script train particular
    Dave> words multiple times as ham or spam, but when I query those words
    Dave> using the sb_imapfilter web interface, they always are shown as
    Dave> having been trained 0 or 1 times, with one of two corresponding
    Dave> probabilities.

    Dave> I do a wildcard query with a single letter and returning 1000
    Dave> results, and there's not a single number over 1 in the #spam or
    Dave> #ham columns.

    Dave> What could be going on?

I've no idea.  It seems to be working for me.  I have lots of singletons(*),
which is to be expected, but also lots of multiples:

    % spamcounts -r spam
    token,nspam,nham,spam prob
    "spam,",2,1,0.5
    spam.,2,0,0.908163265306
    to:addr:spambayes,3,3,0.390338438268
    "spamcop,",1,0,0.844827586207
    email name:spambayes-dev,2,0,0.908163265306
    to:addr:spambayes-dev,2,0,0.908163265306
    spamabyes,0,1,0.155172413793
    spamming?,1,0,0.844827586207
    email name:spambayes,4,3,0.5
    subject:spambayes,2,0,0.908163265306
    spam,0,3,0.0652173913043
    "spamassasin,",0,1,0.155172413793
    message-id:@no.spam.plz,0,1,0.155172413793
    sender:addr:spambayes-bounces+skip=pobox.com,0,1,0.155172413793
    cc:addr:spambayes,1,0,0.844827586207
    from:addr:nospam.org,0,1,0.155172413793
    from:addr:no.spam.plz,0,1,0.155172413793
    spammer,1,0,0.844827586207
    spambayes,0,3,0.0652173913043
    subject:spam,1,1,0.5
    spammed,2,0,0.908163265306
    from:addr:spamgourmet.com,0,1,0.155172413793
    spammers,1,1,0.5
    to:name:spambayes,1,0,0.844827586207
    sender:addr:spambayes-dev-bounces,2,0,0.908163265306
    subject:spam.,1,0,0.844827586207
    url:spambayes-dev,2,0,0.908163265306
    spamming,1,0,0.844827586207
    spambayes.,0,1,0.155172413793
    sender:addr:spambayes-bounces,4,2,0.5
    url:spammer_id,1,0,0.844827586207
    url:spambayes,4,3,0.5
    anti-spam,0,1,0.155172413793

Skip

(*) Linguists call such singletons "hapax legemona".  I guess they were
trying to be snooty when they came up with that term.


More information about the SpamBayes mailing list