[spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000/sandbox find_dupe_props.py, NONE, 1.1

Tim Peters tim.one at comcast.net
Mon Jul 28 22:18:53 EDT 2003


[Mark Hammond]
> Update of /cvsroot/spambayes/spambayes/Outlook2000/sandbox
> In directory sc8-pr-cvs1:/tmp/cvs-serv14316
>
> Added Files:
> 	find_dupe_props.py
> Log Message:
> Little tool to find messages with duplicate property values.  Useful
> to find message that SpamBayes will consider duplicate.
>
> Example, to messages with the same PR_SEARCH_KEY property in 2
> folders:
> ...

That nailed it!  It found 3 pairs of duplicates in my training ham here,
accounting for the 3 "missing ham" in the #-of-ham-trained-on report.  Now
that I see which they were, I recall that at least two of the duplicate
pairs were intentional:  they scored "too far" at the wrong end of the scale
at the time (very early in training), so I put copies in the training ham
hoping to boost their ham scores.  That may not be an effective strategy
<wink -- but I never noticed the duplicates had no effect, and they score
fine now (after more training data was added) without an artificial boost>.




More information about the spambayes-dev mailing list