[Spambayes] Slice o' life

Tim Peters tim.one@comcast.net
Wed Oct 16 01:11:01 2002


This is a multi-part message in MIME format.

---------------------- multipart/mixed attachment
In the background, I've been a guinea pig for Sean True's and Mark Hammond's
experiments hooking our code up to Outlook 2000.  Toward that end, over the
last week I've just been shuffling most of the spam I normally get, and the
truly *hard* ham, into special folders.  By "truly hard ham" I mean assorted
HTML newsletters, PayPal announcements, company newsletters in odd formats,
order/shipping confirmations, and conference announcements.

In all that's 696 spam and 86 truly hard ham so far.  Then I added in about
100 "typical" msgs from assorted work sources and friends.  This has been my
first chance to play with mining the headers for real:

"""
[Tokenizer]
mine_received_headers: True
basic_header_tokenize: True

[Classifier]
use_chi_squared_combining: True
"""

The performance on my real-life email is nothing short of amazing!  The code
adds a "Hammie" field to Outlook msgs, and I fiddled my Outlook views to
show the new field, and to color msgs with a hammie score > 0.05 bold green.
I'll attach a jpeg with a view of the tail end of today's email so far.
That view is in chronological order, and the mix of 0.0 and 1.0 is typical.
There are 523 pending msgs in my inbox right now that haven't been trained
on, and the highest-scoring non-spam is 0.03 (a personal email from someone
I didn't train on yet)  There's also one with a score of 0.01.  All the rest
of the non-spam score 0.00 or -0.00 in the display (yes, I should fix that
<wink>).  All the spam score 1.00.  I suppose it helps that one of my email
accounts automagically puts "Spam:" at the front of suspected-spam msg
Subject lines, but I suspect it wouldn't matter a bit if they didn't.

I didn't realize it before, but this stuff is cool <wink>!

---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: hammie.jpg
Type: image/jpeg
Size: 82792 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021015/6cf680fd/hammie.jpeg

---------------------- multipart/mixed attachment--