[Python-Dev] FW: Your message to Python-Dev awaits moderator
Tue, 27 Aug 2002 23:04:41 -0500
Tim> FYI, here's the closest thing to a real false positive I've seen so
I have much smaller spam and ham corpora (currently about 400 msgs each),
but both consist only of messages sent to me in the past couple weeks
(though not all messages sent during that interval), so some of the header
clues which skewed Tim's tests shouldn't be present. Using my currently
undeleted Python mail as "unknown" (but which doesn't actually contain any
spam), I saw two false positives. One had an attached gif image. The other
was a one-line text+html message whose "words" were thus dominated by the
HTML tags in the second part.
Once my spam and ham grow to something more like 2000 each I will try Tim's
technique of splitting them into smaller chunks, training on one chunk, then
testing against the remaining chunks.