[spambayes-dev] RE: [Spambayes-checkins]
spambayes/Outlook2000/sandbox find_dupe_props.py, NONE, 1.1
Tim Peters
tim.one at comcast.net
Mon Jul 28 22:18:53 EDT 2003
[Mark Hammond]
> Update of /cvsroot/spambayes/spambayes/Outlook2000/sandbox
> In directory sc8-pr-cvs1:/tmp/cvs-serv14316
>
> Added Files:
> find_dupe_props.py
> Log Message:
> Little tool to find messages with duplicate property values. Useful
> to find message that SpamBayes will consider duplicate.
>
> Example, to messages with the same PR_SEARCH_KEY property in 2
> folders:
> ...
That nailed it! It found 3 pairs of duplicates in my training ham here,
accounting for the 3 "missing ham" in the #-of-ham-trained-on report. Now
that I see which they were, I recall that at least two of the duplicate
pairs were intentional: they scored "too far" at the wrong end of the scale
at the time (very early in training), so I put copies in the training ham
hoping to boost their ham scores. That may not be an effective strategy
<wink -- but I never noticed the duplicates had no effect, and they score
fine now (after more training data was added) without an artificial boost>.
More information about the spambayes-dev
mailing list