[spambayes-dev] spammy subject lines

Tim Peters tim.one at comcast.net
Mon Oct 13 20:29:58 EDT 2003

>> ...
>> That is much higher than the unsure rate I usually get (I wasn't paying
>> enough attention, or I would have noticed that).

BTW, this reminds me why I stopped running tests with my Outlook data.
Because I've fallen into almost purely mistake-based and unsure-based
training, almost by definition a ham or spam in my training data now are
difficult to predict from the other ham and spam in my training data.
That's got little to do with how well the classifier does on typical new
messages (it does extremely well on those).  Measuring how a change does in
helping hard cases predict other hard cases wouldn't say something clear
about the effect on predicting new messages, very few of which are "hard".

Luckily <wink> I don't have any time for anything anymore, or I'd have to
get a fresh set of corpora to run tests against (my old test sets are too
stale now -- time to delete 'em).

