[Spambayes] Does SB tokenize the subject?

Mon Dec 27 18:13:49 CET 2004

Good point, which brings the question: how do I tell SB that the word
'prescription' in the subject line is a stronger spam clue that the same
word in the body of the message? AFAIK, there is no such mechanism.

Amir 

-----Original Message-----
From: Seth Goodman [mailto:sethg at GoodmanAssociates.com] 
Sent: Monday, December 27, 2004 19:09
To: spambayes at python.org
Subject: RE: [Spambayes] Does SB tokenize the subject?

> From: Skip Montanaro
> Sent: Monday, December 27, 2004 10:47 AM
>
>
>
>     Amir> With the token being 'Subject:prescription', does it
>     Amir> mean that SB
>     Amir> treats tokens from the subject and/or 'from:' (or any other
>     Amir> header, for that purpose) differently than those in the message
>     Amir> body?
>
> Nope.  It emits many so-called "synthetic" tokens, tokens which are
> synthesized from clues in the message but which don't exactly
> appear in the email.  Once emitted though, they are treated exactly
> the same as "natural" tokens.

Just to amplify Skip's answer, they are treated the same as any other token,
but they are kept track of separately.  That is,

    'subject:prescription'
    'prescription'

are two different tokens that each have their own score.  For different
peoples' mail streams, the word 'prescription' in the subject line might be
a stronger or weaker spam clue that the same word in the body of the
message.  For some people, one or the other might be a strong ham clue, so
it is helpful to score them separately.

--

Seth Goodman

_______________________________________________
Spambayes at python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html