[Spambayes-checkins]
spambayes/Outlook2000 README.txt,1.7,1.8 about.html,1.4,1.5
addin.py,1.38,1.39 filter.py,1.13,1.14 manager.py,1.35,1.36
Mark Hammond
mhammond@users.sourceforge.net
Sun Nov 24 22:43:46 2002
- Previous message: [Spambayes-checkins] spambayes classifier.py,1.53.2.11,1.53.2.12
- Next message: [Spambayes-checkins] spambayes Persistent.py,1.1,1.2
hammiebulk.py,1.1,1.2
Corpus.py,1.2,1.3 FileCorpus.py,1.2,1.3 Options.py,1.75,1.76
TestDriver.py,1.30,1.31 Tester.py,1.8,1.9 classifier.py,1.53,1.54
dbdict.py,1.1,1.2 hammie.py,1.40,1.41 hammiefilter.py,1.2,1.3
pop3proxy.py,1.18,1.19 Bayes.py,1.5,NONE
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1:/tmp/cvs-serv29193
Modified Files:
README.txt about.html addin.py filter.py manager.py
Log Message:
Use a percentage for the SpamScore - this is so we can play nicely
with Outlooks UserProperty API.
NOTE: Does require some user intervention - please see
http://mail.python.org/pipermail/spambayes/2002-November/002170.html
for details.
Index: README.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/README.txt,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** README.txt 19 Nov 2002 22:52:25 -0000 1.7
--- README.txt 24 Nov 2002 22:43:43 -0000 1.8
***************
*** 4,11 ****
you *must* have win32all-149 or later.
! CDO is no longer needed :)
!
! See below for a list of known problems (particularly that you must manually
! create an Outlook property before you can see the Spam scores)
Outlook Addin
--- 4,8 ----
you *must* have win32all-149 or later.
! See below for a list of known problems.
Outlook Addin
Index: about.html
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/about.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** about.html 2 Nov 2002 07:01:21 -0000 1.4
--- about.html 24 Nov 2002 22:43:43 -0000 1.5
***************
*** 5,99 ****
</head>
<body>
! <span style="font-style: italic;">NOTE: This is very very early code. If
! you are looking this, you have probably been told about it against our better
! judgement <wink>. Stuff doesnt work correctly. Fields are
! funny. If you want something known to work well today for alot of people,
! this is not for you.<br>
</span><br style="font-style: italic;">
! The source code is maintained at <a
href="http://spambayes.sourceforge.net">SourceForge</a>.<br>
<br>
! This spam filter uses Bayesian analysis to filter spam. Unlike other
! spam detection systems, Bayesian systems actually "learn" about what you
! consider spam, and continually adapt as both your regular email and spam
! patterns change.<br>
!
! <h2>Training</h2>
! Due to the nature of the system, it must be trained before it can be effective.
! Although the system does learn over time, when first installed it has
! no knowledge of either spam or good email.<br>
!
<h3>Initial Training</h3>
When first installed, it is recommended you perform the following steps:<br>
<ul>
<li>Create two folders - one for "Spam", and one for "Possible Spam"</li>
! <li>Go through your Inbox and Deleted Items, and move as much spam as you
! can find to the "Spam" folder. Try and get as much Spam out of your
! inbox as possible.</li>
! <li>Select the <span style="font-style: italic;">Training</span> dialog.
! Nominate your Spam folder for spam, and your Inbox for good messages,
! and start training.</li>
</ul>
To see how effective your Inbox cleanup was, you may like to try:<br>
<ul>
! <li>Go to the <span style="font-style: italic;">Filter Now</span> dialog.</li>
<li>Select your Inbox as the folder to filter.</li>
! <li>Select <span style="font-style: italic;">Score messages, but dont perform
! filter action</span>.</li>
<li>Clear both checkboxes so all messages will be scored.</li>
<li>Start the score operation.</li>
</ul>
! You can then look at and sort by the Spam field in your Inbox - this is likely
! to find hidden spam that you missed from your inbox cleanup.
!
<h3>Incremental Training</h3>
! When you drag a message to your Spam folder, it will be automatically trained
! as spam. Thus, as the classifier misses spam (or is unsure about them),
! it learns as you correct it.<br>
! If messages are dropped back into the Inbox, they are trained as good - thus,
! the system learns what good messages look like should it incorrectly classify
! it as spam or possible spam.<br>
!
! <h2>Creating a Spam Score Field</h2>
! A custom property named "Spam" is added to all Outlook messages scored.
! This is an integer in 0 (ham) through 100 (spam) inclusive.
! You can teach Outlook to display this field as a column in any table view,
! like the standard Messages view.
! <p>
! This takes some work, and has to be done again for every folder in which
! you want to display a Spam column:
<ul>
! <li>While looking at an Outlook table view (like Messages), right-click
! on the line with column headers (From, Subject, To, Received, ...).
! In the context menu that pops up, click on Field Chooser. A box
! with title <i>Field Chooser</i> pops up.
<li>In the lower left corner of the <i>Field Chooser</i> box, click
! <i>New...</i>. A box with title <i>New Field</i> pops up.
! <li>In the <i>Name:</i> box, type Spam.
! <li>In the <i>Type:</i> dropdown list, select <i>Integer</i>. This is the
! last choice in the dropdown list.
! Do not select <i>Number</i> -- it won't work.
! <li>The <i>Format:</i> dropdown list should display "1,234" now. Leave it alone.
! <li>Click OK in the <i>New Field</i> box. Now you're back in the
! <i>Field Chooser</i> box.
! <li>The dropdown list at the top of the <i>Field Chooser</i> box should say
! <i>User-defined fields in FOLDER</i> now, where FOLDER is the name of the
! folder you're currently looking at (like Inbox). Below that, you
! should see a new rectangular button with a Spam label.
! <li>Use your mouse to drag the Spam button to the column header position
! where you want to see the Spam column. You don't have to be precise
! here -- you can rearrange or resize the column later just by dragging
! it around.
! <li>You're done! Close the <i>Field Chooser</i> box.
</ul>
! Outlook's standard Automatic Formatting features can also be taught how to
! access the value of this field; for example, you could tell Outlook to display
! rows with suspected spam messages in green italic. However, for whatever reason,
! the Outlook Rules Wizard does not allow creating rules based on user-defined
! fields. That's why this addin supplies its own filtering rules.
!
! <p>
! Contributions to this documentation are welcome!<br>
<br>
</body>
</html>
--- 5,117 ----
</head>
<body>
! <h1>SpamBayes Outlook Plugin<br>
! </h1>
! <span style="font-style: italic;">NOTE: This is very very early code.
! If you are looking at this, you have probably been told about it
! against our better judgement <wink>. Stuff doesnt work
! correctly. If you want something known to work well today for alot
! of people, this is not for you.</span> That said, this plug-in
! works amazingly well! So welcome aboard.<span
! style="font-style: italic;"><br>
</span><br style="font-style: italic;">
! This spam filter uses Bayesian analysis to filter spam. Unlike
! other spam detection systems, Bayesian systems actually "learn" about
! what you consider spam, and continually adapt as both your regular email
! and spam patterns change. The source code is maintained at <a
href="http://spambayes.sourceforge.net">SourceForge</a>.<br>
<br>
! Here you can find information on:<br>
! <div style="margin-left: 40px;"><a href="#Training">Training</a><br>
! <a href="#Field">Viewing the Spam Score field</a><br>
! </div>
! <h2><a name="Training"></a>Training</h2>
! Due to the nature of the system, it must be trained before it can be
! effective. Although the system does learn over time, when first
! installed it has no knowledge of either spam or good email.<br>
<h3>Initial Training</h3>
When first installed, it is recommended you perform the following steps:<br>
<ul>
<li>Create two folders - one for "Spam", and one for "Possible Spam"</li>
! <li>Go through your Inbox and Deleted Items, and move as much spam as
! you can find to the "Spam" folder. Try and get as much Spam out of
! your inbox as possible.</li>
! <li>Select the <span style="font-style: italic;">Training</span>
! dialog. Nominate your Spam folder for spam, and your Inbox for
! good messages, and start training.</li>
</ul>
To see how effective your Inbox cleanup was, you may like to try:<br>
<ul>
! <li>Go to the <span style="font-style: italic;">Filter Now</span>
! dialog.</li>
<li>Select your Inbox as the folder to filter.</li>
! <li>Select <span style="font-style: italic;">Score messages, but
! dont perform filter action</span>.</li>
<li>Clear both checkboxes so all messages will be scored.</li>
<li>Start the score operation.</li>
</ul>
! You can then look at and sort by the Spam field in your Inbox - this is
! likely to find hidden spam that you missed from your inbox cleanup.
<h3>Incremental Training</h3>
! When you drag a message to your Spam folder, it will be automatically
! trained as spam. Thus, as the classifier misses spam (or is unsure
! about them), it learns as you correct it.<br>
! If messages are dropped back into the Inbox, they are trained as good -
! thus, the system learns what good messages look like should it
! incorrectly classify it as spam or possible spam.<br>
! You will also notice a "Delete as Spam" button (in all folders except
! the Spam folder) and a "Recover from Spam" button in the Spam and Unsure
! folders. These buttons have the same effect as the drags above.
! (Note that currently the "Recover from Spam" option will move the
! item to the Inbox - this is a bug - it should restore the message to
! the folder it was originally filtered from in the first place)<br>
! <h2><a name="Field"></a>Viewing the Spam Score Field</h2>
! A custom property named <span style="font-style: italic;">Spam</span>
! is added to all Outlook messages scored. This is a percentage indicating
! the likelihood of the message being spam (ie, 0% is "certain" ham; 100%
! if "certain" spam). You can teach Outlook to display this field as a
! column in any table view, like the standard Messages view.
! <p> This takes some work, and has to be done again for every folder in
! which you want to display a Spam column: </p>
<ul>
! <li>While looking at an Outlook table view (like Messages),
! right-click on the line with column headers (From, Subject, To,
! Received, ...). In the context menu that pops up, click on Field
! Chooser. A box with title <i>Field Chooser</i> pops up.</li>
! <li>In the drop-down list at the top of the <span
! style="font-style: italic;">Field Chooser</span> window, select <span
! style="font-style: italic;">User Defined Fields</span></li>
! <li>Below the drop-down, you should see a rectangular button
! with a <span style="font-style: italic;">Spam</span> label . This<span
! style="font-style: italic;"></span> should be automatically created for
! all folders managed by the system, but if it does not appear, you will
! need to add it yourself. To do this, perform the following steps</li>
! <ul>
<li>In the lower left corner of the <i>Field Chooser</i> box, click
! <i>New...</i>. A box with title <i>New Field</i> pops up. </li>
! <li>In the <i>Name:</i> box, type Spam. </li>
! <li>In the <i>Type:</i> dropdown list, select <i>Percent</i>.
! This is the third choice in the dropdown list. Do not
! select any other format -- it won't work. </li>
! <li>The <i>Format:</i> select the first entry in the list -
! "Rounded"</li>
! <li>Click OK in the <i>New Field</i> box. Now you're back in the <i>Field
! Chooser</i> box, with a new <span style="font-style: italic;">Spam</span>
! button shown. </li>
! </ul>
! <li>Use your mouse to drag the <span style="font-style: italic;">Spam</span>
! button to the column header position where you want to see the
! Spam column. You don't have to be precise here -- you can
! rearrange or resize the column later just by dragging it around. </li>
! <li>You're done! Close the <i>Field Chooser</i> box. </li>
</ul>
! Outlook's standard Automatic Formatting features can also be taught how
! to access the value of this field; for example, you could tell Outlook
! to display rows with suspected spam messages in green italic. However,
! for whatever reason, the Outlook Rules Wizard does not allow creating
! rules based on user-defined fields. That's why this addin supplies its
! own filtering rules.
! <p> Contributions to this documentation are welcome!<br>
<br>
+ </p>
</body>
</html>
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** addin.py 23 Nov 2002 10:47:10 -0000 1.38
--- addin.py 24 Nov 2002 22:43:43 -0000 1.39
***************
*** 199,203 ****
import train
trained_as_good = train.been_trained_as_ham(msgstore_message, self.manager)
! if self.manager.config.filter.spam_threshold > prop or \
trained_as_good:
subject = item.Subject.encode("mbcs", "replace")
--- 199,203 ----
import train
trained_as_good = train.been_trained_as_ham(msgstore_message, self.manager)
! if self.manager.config.filter.spam_threshold > prop * 100 or \
trained_as_good:
subject = item.Subject.encode("mbcs", "replace")
***************
*** 222,226 ****
item = msgstore_message.GetOutlookItem()
! score, clues = mgr.score(msgstore_message, evidence=True, scale=False)
new_msg = app.CreateItem(0)
# NOTE: Silly Outlook always switches the message editor back to RTF
--- 222,226 ----
item = msgstore_message.GetOutlookItem()
! score, clues = mgr.score(msgstore_message, evidence=True)
new_msg = app.CreateItem(0)
# NOTE: Silly Outlook always switches the message editor back to RTF
Index: filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/filter.py,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** filter.py 7 Nov 2002 22:30:09 -0000 1.13
--- filter.py 24 Nov 2002 22:43:43 -0000 1.14
***************
*** 14,21 ****
config = mgr.config.filter
prob = mgr.score(msg)
! if prob >= config.spam_threshold:
disposition = "Yes"
attr_prefix = "spam"
! elif prob >= config.unsure_threshold:
disposition = "Unsure"
attr_prefix = "unsure"
--- 14,22 ----
config = mgr.config.filter
prob = mgr.score(msg)
! prob_perc = prob * 100
! if prob_perc >= config.spam_threshold:
disposition = "Yes"
attr_prefix = "spam"
! elif prob_perc >= config.unsure_threshold:
disposition = "Unsure"
attr_prefix = "unsure"
Index: manager.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v
retrieving revision 1.35
retrieving revision 1.36
diff -C2 -d -r1.35 -r1.36
*** manager.py 23 Nov 2002 10:32:48 -0000 1.35
--- manager.py 24 Nov 2002 22:43:43 -0000 1.36
***************
*** 96,99 ****
--- 96,105 ----
# So until we know better, use Outlook to hack this in.
# Should be called once per folder you are watching/filtering etc
+ #
+ # Oh the tribulations of our property grail
+ # We originally wanted to use the "Integer" Outlook field,
+ # but it seems this property type alone is not expose via the Object
+ # model. So we resort to olPercent, and live with the % sign
+ # (which really is OK!)
assert self.outlook is not None, "I need outlook :("
ol = self.outlook
***************
*** 107,113 ****
if item is not None:
ups = item.UserProperties
- # Display format is documented as being the 1-based index in
- # the combo box in the outlook UI for the given data type.
- # 1 is the first - "all digits", which seems fine.
# *sigh* - need to search by int index
for i in range(ups.Count):
--- 113,116 ----
***************
*** 117,133 ****
else: # for not broken
try:
ups.Add(self.config.field_score_name,
! # "Integer" from the UI doesn't exist!
! # 'olNumber' doesn't seem to work with PT_INT*
! win32com.client.constants.olCombination,
! True) # Add to folder
item.Save()
if self.verbose > 1:
print "Created the UserProperty!"
! except pythoncom.com_error:
! pass # We know, we know...
! ## import traceback
! ## print "Failed to create the field"
! ## traceback.print_exc()
# else no items in this folder - not much worth doing!
if include_sub:
--- 120,142 ----
else: # for not broken
try:
+ # Display format is documented as being the 1-based index in
+ # the combo box in the outlook UI for the given data type.
+ # 1 is the first - "Rounded", which seems fine.
+ format = 1
ups.Add(self.config.field_score_name,
! win32com.client.constants.olPercent,
! True, # Add to folder
! format)
item.Save()
if self.verbose > 1:
print "Created the UserProperty!"
! except pythoncom.com_error, details:
! print "Warning: failed to create the Outlook " \
! "user-property in folder '%s'" \
! % (folder.Name.encode("mbcs", "replace"),)
! print "", details
! print " This is probably because the code has recently"\
! " been changed, but it will"
! print " have no effect on the filtering or scoring."
# else no items in this folder - not much worth doing!
if include_sub:
***************
*** 251,255 ****
self.outlook = None
! def score(self, msg, evidence=False, scale=True):
"""Score a msg.
--- 260,264 ----
self.outlook = None
! def score(self, msg, evidence=False):
"""Score a msg.
***************
*** 261,280 ****
where clues is a list of the (word, spamprob(word)) pairs that
went into determining the score. Else just the score is returned.
-
- If optional arg scale is specified and false, the score is a float
- in 0.0 (ham) thru 1.0 (spam). Else (the default), the score is
- scaled into an integer from 0 (ham) thru 100 (spam).
"""
-
email = msg.GetEmailPackageObject()
result = self.bayes.spamprob(bayes_tokenize(email), evidence)
- if not scale:
- return result
- # For sister-friendliness, multiply score by 100 and round to an int.
if evidence:
score, the_evidence = result
else:
score = result
- score = int(round(score * 100.0))
if evidence:
return score, the_evidence
--- 270,280 ----
- Previous message: [Spambayes-checkins] spambayes classifier.py,1.53.2.11,1.53.2.12
- Next message: [Spambayes-checkins] spambayes Persistent.py,1.1,1.2
hammiebulk.py,1.1,1.2
Corpus.py,1.2,1.3 FileCorpus.py,1.2,1.3 Options.py,1.75,1.76
TestDriver.py,1.30,1.31 Tester.py,1.8,1.9 classifier.py,1.53,1.54
dbdict.py,1.1,1.2 hammie.py,1.40,1.41 hammiefilter.py,1.2,1.3
pop3proxy.py,1.18,1.19 Bayes.py,1.5,NONE
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the Spambayes-checkins
mailing list