[Spambayes-checkins] spambayes/Outlook2000 README.txt,1.7,1.8 about.html,1.4,1.5 addin.py,1.38,1.39 filter.py,1.13,1.14 manager.py,1.35,1.36

Mark Hammond mhammond@users.sourceforge.net
Sun Nov 24 22:43:46 2002


Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1:/tmp/cvs-serv29193

Modified Files:
	README.txt about.html addin.py filter.py manager.py 
Log Message:
Use a percentage for the SpamScore - this is so we can play nicely
with Outlooks UserProperty API.

NOTE: Does require some user intervention - please see
http://mail.python.org/pipermail/spambayes/2002-November/002170.html
for details.



Index: README.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/README.txt,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** README.txt	19 Nov 2002 22:52:25 -0000	1.7
--- README.txt	24 Nov 2002 22:43:43 -0000	1.8
***************
*** 4,11 ****
  you *must* have win32all-149 or later.
  
! CDO is no longer needed :)
! 
! See below for a list of known problems (particularly that you must manually
! create an Outlook property before you can see the Spam scores)
  
  Outlook Addin
--- 4,8 ----
  you *must* have win32all-149 or later.
  
! See below for a list of known problems.
  
  Outlook Addin

Index: about.html
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/about.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** about.html	2 Nov 2002 07:01:21 -0000	1.4
--- about.html	24 Nov 2002 22:43:43 -0000	1.5
***************
*** 5,99 ****
  </head>
  <body>
! <span style="font-style: italic;">NOTE: This is very very early code. &nbsp;If
! you are looking this, you have probably been told about it against our better
! judgement &lt;wink&gt;. &nbsp;Stuff doesnt work correctly. &nbsp;Fields are
! funny. &nbsp;If you want something known to work well today for alot of people,
! this is not for you.<br>
  </span><br style="font-style: italic;">
! The source code is maintained at <a
   href="http://spambayes.sourceforge.net">SourceForge</a>.<br>
  <br>
! This spam filter uses Bayesian analysis to filter spam. &nbsp;Unlike other
! spam detection systems, Bayesian systems actually "learn" about what you
! consider spam, and continually adapt as both your regular email and spam
! patterns change.<br>
! 
! <h2>Training</h2>
! Due to the nature of the system, it must be trained before it can be effective.
! &nbsp;Although the system does learn over time, when first installed it has
! no knowledge of either spam or good email.<br>
! 
  <h3>Initial Training</h3>
  When first installed, it is recommended you perform the following steps:<br>
  <ul>
    <li>Create two folders - one for "Spam", and one for "Possible Spam"</li>
!   <li>Go through your Inbox and Deleted Items, and move as much spam as you
! can find to the "Spam" folder. &nbsp;Try and get as much Spam out of your
! inbox as possible.</li>
!   <li>Select the <span style="font-style: italic;">Training</span> dialog.
! &nbsp;Nominate your Spam folder for spam, and your Inbox for good messages,
! and start training.</li>
  </ul>
  To see how effective your Inbox cleanup was, you may like to try:<br>
  <ul>
!   <li>Go to the <span style="font-style: italic;">Filter Now</span> dialog.</li>
    <li>Select your Inbox as the folder to filter.</li>
!   <li>Select <span style="font-style: italic;">Score messages, but dont perform
! filter action</span>.</li>
    <li>Clear both checkboxes so all messages will be scored.</li>
    <li>Start the score operation.</li>
  </ul>
! You can then look at and sort by the Spam field in your Inbox - this is likely
! to find hidden spam that you missed from your inbox cleanup.
! 
  <h3>Incremental Training</h3>
! When you drag a message to your Spam folder, it will be automatically trained
! as spam. &nbsp;Thus, as the classifier misses spam (or is unsure about them),
! it learns as you correct it.<br>
! If messages are dropped back into the Inbox, they are trained as good - thus,
! the system learns what good messages look like should it incorrectly classify
! it as spam or possible spam.<br>
! 
! <h2>Creating a Spam Score Field</h2>
! A custom property named "Spam" is added to all Outlook messages scored.
! This is an integer in 0 (ham) through 100 (spam) inclusive.
! You can teach Outlook to display this field as a column in any table view,
! like the standard Messages view.
! <p>
! This takes some work, and has to be done again for every folder in which
! you want to display a Spam column:
  <ul>
!     <li>While looking at an Outlook table view (like Messages), right-click
!         on the line with column headers (From, Subject, To, Received, ...).
!         In the context menu that pops up, click on Field Chooser.  A box
!         with title <i>Field Chooser</i> pops up.
      <li>In the lower left corner of the <i>Field Chooser</i> box, click
!         <i>New...</i>.  A box with title <i>New Field</i> pops up.
!     <li>In the <i>Name:</i> box, type Spam.
!     <li>In the <i>Type:</i> dropdown list, select <i>Integer</i>.  This is the
!         last choice in the dropdown list.
!         Do not select <i>Number</i> -- it won't work.
!     <li>The <i>Format:</i> dropdown list should display "1,234" now.  Leave it alone.
!     <li>Click OK in the <i>New Field</i> box.  Now you're back in the
!         <i>Field Chooser</i> box.
!     <li>The dropdown list at the top of the <i>Field Chooser</i> box should say
!         <i>User-defined fields in FOLDER</i> now, where FOLDER is the name of the
!         folder you're currently looking at (like Inbox).  Below that, you
!         should see a new rectangular button with a Spam label.
!     <li>Use your mouse to drag the Spam button to the column header position
!         where you want to see the Spam column.  You don't have to be precise
!         here -- you can rearrange or resize the column later just by dragging
!         it around.
!     <li>You're done!  Close the <i>Field Chooser</i> box.
  </ul>
! Outlook's standard Automatic Formatting features can also be taught how to
! access the value of this field; for example, you could tell Outlook to display
! rows with suspected spam messages in green italic.  However, for whatever reason,
! the Outlook Rules Wizard does not allow creating rules based on user-defined
! fields.  That's why this addin supplies its own filtering rules.
! 
! <p>
! Contributions to this documentation are welcome!<br>
  <br>
  </body>
  </html>
--- 5,117 ----
  </head>
  <body>
! <h1>SpamBayes Outlook Plugin<br>
! </h1>
! <span style="font-style: italic;">NOTE: This is very very early code.
! &nbsp;If you are looking at this, you have probably been told about it
! against our better judgement &lt;wink&gt;. &nbsp;Stuff doesnt work
! correctly. &nbsp;If you want something known to work well today for alot
! of people, this is not for you.</span> &nbsp;That said, this plug-in
! works amazingly well! So welcome aboard.<span
!  style="font-style: italic;"><br>
  </span><br style="font-style: italic;">
! This spam filter uses Bayesian analysis to filter spam. &nbsp;Unlike
! other spam detection systems, Bayesian systems actually "learn" about
! what you consider spam, and continually adapt as both your regular email
! and spam patterns change. The source code is maintained at <a
   href="http://spambayes.sourceforge.net">SourceForge</a>.<br>
  <br>
! Here you can find information on:<br>
! <div style="margin-left: 40px;"><a href="#Training">Training</a><br>
! <a href="#Field">Viewing the Spam Score field</a><br>
! </div>
! <h2><a name="Training"></a>Training</h2>
! Due to the nature of the system, it must be trained before it can be
! effective. &nbsp;Although the system does learn over time, when first
! installed it has no knowledge of either spam or good email.<br>
  <h3>Initial Training</h3>
  When first installed, it is recommended you perform the following steps:<br>
  <ul>
    <li>Create two folders - one for "Spam", and one for "Possible Spam"</li>
!   <li>Go through your Inbox and Deleted Items, and move as much spam as
! you can find to the "Spam" folder. &nbsp;Try and get as much Spam out of
! your inbox as possible.</li>
!   <li>Select the <span style="font-style: italic;">Training</span>
! dialog. &nbsp;Nominate your Spam folder for spam, and your Inbox for
! good messages, and start training.</li>
  </ul>
  To see how effective your Inbox cleanup was, you may like to try:<br>
  <ul>
!   <li>Go to the <span style="font-style: italic;">Filter Now</span>
! dialog.</li>
    <li>Select your Inbox as the folder to filter.</li>
!   <li>Select <span style="font-style: italic;">Score messages, but
! dont perform filter action</span>.</li>
    <li>Clear both checkboxes so all messages will be scored.</li>
    <li>Start the score operation.</li>
  </ul>
! You can then look at and sort by the Spam field in your Inbox - this is
! likely to find hidden spam that you missed from your inbox cleanup.
  <h3>Incremental Training</h3>
! When you drag a message to your Spam folder, it will be automatically
! trained as spam. &nbsp;Thus, as the classifier misses spam (or is unsure
! about them), it learns as you correct it.<br>
! If messages are dropped back into the Inbox, they are trained as good -
! thus, the system learns what good messages look like should it
! incorrectly classify it as spam or possible spam.<br>
! You will also notice a "Delete as Spam" button (in all folders except
! the Spam folder) and a "Recover from Spam" button in the Spam and Unsure
! folders. &nbsp;These buttons have the same effect as the drags above.
! &nbsp;(Note that currently the "Recover from Spam" option will move the
! item to the Inbox - this is a bug - it should restore the message to
! the folder it was originally filtered from in the first place)<br>
! <h2><a name="Field"></a>Viewing the Spam Score Field</h2>
! A custom property named <span style="font-style: italic;">Spam</span>
! is added to all Outlook messages scored. This is a percentage indicating
! the likelihood of the message being spam (ie, 0% is "certain" ham; 100%
! if "certain" spam). You can teach Outlook to display this field as a
! column in any table view, like the standard Messages view.
! <p> This takes some work, and has to be done again for every folder in
! which you want to display a Spam column: </p>
  <ul>
!   <li>While looking at an Outlook table view (like Messages),
! right-click on the line with column headers (From, Subject, To,
! Received, ...).         In the context menu that pops up, click on Field
! Chooser.  A box         with title <i>Field Chooser</i> pops up.</li>
!   <li>In the drop-down list at the top of the <span
!  style="font-style: italic;">Field Chooser</span> window, select <span
!  style="font-style: italic;">User Defined Fields</span></li>
!   <li>Below the drop-down, you         should see a rectangular button
! with a <span style="font-style: italic;">Spam</span> label . This<span
!  style="font-style: italic;"></span> should be automatically created for
! all folders managed by the system, but if it does not appear, you will
! need to add it yourself. &nbsp;To do this, perform the following steps</li>
!   <ul>
      <li>In the lower left corner of the <i>Field Chooser</i> box, click
!  <i>New...</i>.  A box with title <i>New Field</i> pops up. </li>
!     <li>In the <i>Name:</i> box, type Spam. </li>
!     <li>In the <i>Type:</i> dropdown list, select <i>Percent</i>.
! This is the         third choice in the dropdown list.         Do not
! select any other format -- it won't work. </li>
!     <li>The <i>Format:</i> select the first entry in the list -
! "Rounded"</li>
!     <li>Click OK in the <i>New Field</i> box.  Now you're back in the <i>Field
! Chooser</i> box, with a new <span style="font-style: italic;">Spam</span>
! button shown. </li>
!   </ul>
!   <li>Use your mouse to drag the <span style="font-style: italic;">Spam</span>
! button to the column header position         where you want to see the
! Spam column.  You don't have to be precise         here -- you can
! rearrange or resize the column later just by dragging         it around. </li>
!   <li>You're done!  Close the <i>Field Chooser</i> box. </li>
  </ul>
! Outlook's standard Automatic Formatting features can also be taught how
! to access the value of this field; for example, you could tell Outlook
! to display rows with suspected spam messages in green italic.  However,
! for whatever reason, the Outlook Rules Wizard does not allow creating
! rules based on user-defined fields.  That's why this addin supplies its
! own filtering rules.
! <p> Contributions to this documentation are welcome!<br>
  <br>
+ </p>
  </body>
  </html>

Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** addin.py	23 Nov 2002 10:47:10 -0000	1.38
--- addin.py	24 Nov 2002 22:43:43 -0000	1.39
***************
*** 199,203 ****
              import train
              trained_as_good = train.been_trained_as_ham(msgstore_message, self.manager)
!             if self.manager.config.filter.spam_threshold > prop or \
                 trained_as_good:
                  subject = item.Subject.encode("mbcs", "replace")
--- 199,203 ----
              import train
              trained_as_good = train.been_trained_as_ham(msgstore_message, self.manager)
!             if self.manager.config.filter.spam_threshold > prop * 100 or \
                 trained_as_good:
                  subject = item.Subject.encode("mbcs", "replace")
***************
*** 222,226 ****
  
      item = msgstore_message.GetOutlookItem()
!     score, clues = mgr.score(msgstore_message, evidence=True, scale=False)
      new_msg = app.CreateItem(0)
      # NOTE: Silly Outlook always switches the message editor back to RTF
--- 222,226 ----
  
      item = msgstore_message.GetOutlookItem()
!     score, clues = mgr.score(msgstore_message, evidence=True)
      new_msg = app.CreateItem(0)
      # NOTE: Silly Outlook always switches the message editor back to RTF

Index: filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/filter.py,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** filter.py	7 Nov 2002 22:30:09 -0000	1.13
--- filter.py	24 Nov 2002 22:43:43 -0000	1.14
***************
*** 14,21 ****
      config = mgr.config.filter
      prob = mgr.score(msg)
!     if prob >= config.spam_threshold:
          disposition = "Yes"
          attr_prefix = "spam"
!     elif prob >= config.unsure_threshold:
          disposition = "Unsure"
          attr_prefix = "unsure"
--- 14,22 ----
      config = mgr.config.filter
      prob = mgr.score(msg)
!     prob_perc = prob * 100
!     if prob_perc >= config.spam_threshold:
          disposition = "Yes"
          attr_prefix = "spam"
!     elif prob_perc >= config.unsure_threshold:
          disposition = "Unsure"
          attr_prefix = "unsure"

Index: manager.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v
retrieving revision 1.35
retrieving revision 1.36
diff -C2 -d -r1.35 -r1.36
*** manager.py	23 Nov 2002 10:32:48 -0000	1.35
--- manager.py	24 Nov 2002 22:43:43 -0000	1.36
***************
*** 96,99 ****
--- 96,105 ----
          # So until we know better, use Outlook to hack this in.
          # Should be called once per folder you are watching/filtering etc
+         #
+         # Oh the tribulations of our property grail
+         # We originally wanted to use the "Integer" Outlook field,
+         # but it seems this property type alone is not expose via the Object
+         # model.  So we resort to olPercent, and live with the % sign
+         # (which really is OK!)
          assert self.outlook is not None, "I need outlook :("
          ol = self.outlook
***************
*** 107,113 ****
          if item is not None:
              ups = item.UserProperties
-             # Display format is documented as being the 1-based index in
-             # the combo box in the outlook UI for the given data type.
-             # 1 is the first - "all digits", which seems fine.
              # *sigh* - need to search by int index
              for i in range(ups.Count):
--- 113,116 ----
***************
*** 117,133 ****
              else: # for not broken
                  try:
                      ups.Add(self.config.field_score_name,
!                            # "Integer" from the UI doesn't exist!
!                            # 'olNumber' doesn't seem to work with PT_INT*
!                            win32com.client.constants.olCombination,
!                            True) # Add to folder
                      item.Save()
                      if self.verbose > 1:
                          print "Created the UserProperty!"
!                 except pythoncom.com_error:
!                     pass # We know, we know...
! ##                    import traceback
! ##                    print "Failed to create the field"
! ##                    traceback.print_exc()
          # else no items in this folder - not much worth doing!
          if include_sub:
--- 120,142 ----
              else: # for not broken
                  try:
+                     # Display format is documented as being the 1-based index in
+                     # the combo box in the outlook UI for the given data type.
+                     # 1 is the first - "Rounded", which seems fine.
+                     format = 1
                      ups.Add(self.config.field_score_name,
!                            win32com.client.constants.olPercent,
!                            True, # Add to folder
!                            format)
                      item.Save()
                      if self.verbose > 1:
                          print "Created the UserProperty!"
!                 except pythoncom.com_error, details:
!                     print "Warning: failed to create the Outlook " \
!                           "user-property in folder '%s'" \
!                           % (folder.Name.encode("mbcs", "replace"),)
!                     print "", details
!                     print " This is probably because the code has recently"\
!                           " been changed, but it will"
!                     print " have no effect on the filtering or scoring."
          # else no items in this folder - not much worth doing!
          if include_sub:
***************
*** 251,255 ****
          self.outlook = None
  
!     def score(self, msg, evidence=False, scale=True):
          """Score a msg.
  
--- 260,264 ----
          self.outlook = None
  
!     def score(self, msg, evidence=False):
          """Score a msg.
  
***************
*** 261,280 ****
          where clues is a list of the (word, spamprob(word)) pairs that
          went into determining the score.  Else just the score is returned.
- 
-         If optional arg scale is specified and false, the score is a float
-         in 0.0 (ham) thru 1.0 (spam).  Else (the default), the score is
-         scaled into an integer from 0 (ham) thru 100 (spam).
          """
- 
          email = msg.GetEmailPackageObject()
          result = self.bayes.spamprob(bayes_tokenize(email), evidence)
-         if not scale:
-             return result
-         # For sister-friendliness, multiply score by 100 and round to an int.
          if evidence:
              score, the_evidence = result
          else:
              score = result
-         score = int(round(score * 100.0))
          if evidence:
              return score, the_evidence
--- 270,280 ----





More information about the Spambayes-checkins mailing list