[Spambayes] ok, i'm confused

Skip Montanaro skip at pobox.com
Fri Mar 7 15:11:56 EST 2003


Here are the original X-Spambayes headers for the full-o'-spaces message:

  X-Spambayes-Debug: '*H*': 0.56; '*S*': 0.47; 'subject:none': 0.05;
          'charset:us-ascii': 0.17; 'header:Message-ID:1': 0.35; 'cc:2**2': 0.62;
          'header:Mime-Version:1': 0.65; 'skip:1 10': 0.77; 'header:Received:3': 0.90
  X-Spambayes-Classification: unsure; 0.46

After my latest tweak to the tokenizer (ratio of spaces to total number of
characters, after deleting leading and trailing whitespace on each line) and
complete retraining (11k+ ham 7k+ spam), I get:

  X-Spambayes-Debug: '*H*': 0.56; '*S*': 0.47; 'subject:none': 0.05;
          'charset:us-ascii': 0.17; 'header:Message-ID:1': 0.35;
          'cc:2**2': 0.62;        'header:Mime-Version:1': 0.65; 'skip:1 10': 0.77;
          'header:Received:3': 0.90
  X-Spambayes-Classification: spam; 0.95

I've done nothing to adjust the values displayed in the X-Spambayes-Debug
header, so all generated tokens should be displayed, and as you can see, all
displayed tokens are the same, before and after.  My space ratio token isn't
displayed (if I insert a print before the relevant yield statement I see it
has a value of 'space ratio: 0.9').  Why is the message now classified as
spam when before is was solidly in the middle of unsure?

Skip




More information about the Spambayes mailing list