[Spambayes-checkins] spambayes/spambayes Options.py, 1.63, 1.64 OptionsClass.py, 1.10, 1.11 hammie.py, 1.5, 1.6 message.py, 1.31, 1.32 storage.py, 1.19, 1.20

Tony Meyer anadelonbrin at users.sourceforge.net
Mon Aug 25 03:00:40 EDT 2003


Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv2977/spambayes

Modified Files:
	Options.py OptionsClass.py hammie.py message.py storage.py 
Log Message:
Fix some old option names.

Change notate_to and notate_subject options to tuples - if empty, do nothing
(like old False), otherwise if classification is in option value, then mutate the
header (old True meant only spam messages).

Add no_cache_large_messages option.  If messages are bigger than this, don't
cache them (to avoid caching messages with massive attachments that are
already correctly classified).

Make pop3proxy and hammie use the storage.open_storage function.

Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.63
retrieving revision 1.64
diff -C2 -d -r1.63 -r1.64
*** Options.py	25 Aug 2003 00:40:44 -0000	1.63
--- Options.py	25 Aug 2003 09:00:36 -0000	1.64
***************
*** 19,22 ****
--- 19,23 ----
     modules are changed over.
  """
+ 
  import os
  
***************
*** 552,561 ****
  
      # The three disposition names are added to the header as the following
!     # Three words:
      ("header_spam_string", "Spam disposition name", "spam",
       """The header that Spambayes inserts into each email has a name,
!      (Header Name, above), and a value.  If the classifier determines
!      that this email is probably spam, it places a header named as
!      above with a value as specified by this string.  The default
       value should work just fine, but you may change it to anything
       that you wish.""",
--- 553,562 ----
  
      # The three disposition names are added to the header as the following
!     # three words:
      ("header_spam_string", "Spam disposition name", "spam",
       """The header that Spambayes inserts into each email has a name,
!      (Classification eader name, above), and a value.  If the classifier
!      determines that this email is probably spam, it places a header named
!      as above with a value as specified by this string.  The default
       value should work just fine, but you may change it to anything
       that you wish.""",
***************
*** 563,568 ****
  
      ("header_ham_string", "Ham disposition name", "ham",
!      """As for Spam Designation, but for emails classified as
!      Ham.""",
       HEADER_VALUE, RESTORE),
  
--- 564,568 ----
  
      ("header_ham_string", "Ham disposition name", "ham",
!      """As for Spam Designation, but for emails classified as Ham.""",
       HEADER_VALUE, RESTORE),
  
***************
*** 575,579 ****
  
      ("header_score_digits", "Accuracy of reported score", 2,
!      """Accuracy of the score in the header in decimal digits""",
       INTEGER, RESTORE),
  
--- 575,579 ----
  
      ("header_score_digits", "Accuracy of reported score", 2,
!      """Accuracy of the score in the header in decimal digits.""",
       INTEGER, RESTORE),
  
***************
*** 588,592 ****
       probability into each mail.  If you can view headers with your
       mailer, then you can see this information, which can be interesting
!      and even instructive if you're a serious Spambayes junkie.""",
       BOOLEAN, RESTORE),
  
--- 588,592 ----
       probability into each mail.  If you can view headers with your
       mailer, then you can see this information, which can be interesting
!      and even instructive if you're a serious SpamBayes junkie.""",
       BOOLEAN, RESTORE),
  
***************
*** 636,644 ****
       showclue to a lower value, such as 0.1""",
       REAL, RESTORE),
    ),
  
!   # pop3proxy settings - pop3proxy also respects the options in the Hammie
!   # section, with the exception of the extra header details at the moment.
!   # The only mandatory option is pop3proxy_servers, eg.
    # "pop3.my-isp.com:110", or a comma-separated list of those.  The ":110"
    # is optional.  If you specify more than one server in pop3proxy_servers,
--- 636,649 ----
       showclue to a lower value, such as 0.1""",
       REAL, RESTORE),
+ 
+     ("add_unique_id", "Add unique spambayes id", True,
+      """If you wish to be able to find a specific message (via the 'find'
+      box on the home page), or use the SMTP proxy to train using cached
+      messages, you will need to know the unique id of each message.  This
+      option adds this information to a header added to each message.""",
+      BOOLEAN, RESTORE),
    ),
  
!   # pop3proxy settings: The only mandatory option is pop3proxy_servers, eg.
    # "pop3.my-isp.com:110", or a comma-separated list of those.  The ":110"
    # is optional.  If you specify more than one server in pop3proxy_servers,
***************
*** 646,650 ****
    "pop3proxy" : (
      ("remote_servers", "Remote Servers", (),
!      """The Spambayes POP3 proxy intercepts incoming email and classifies
       it before sending it on to your email client.  You need to specify
       which POP3 server(s) you wish it to intercept - a POP3 server
--- 651,655 ----
    "pop3proxy" : (
      ("remote_servers", "Remote Servers", (),
!      """The SpamBayes POP3 proxy intercepts incoming email and classifies
       it before sending it on to your email client.  You need to specify
       which POP3 server(s) you wish it to intercept - a POP3 server
***************
*** 653,665 ****
       these server names from your existing email configuration, or from
       your ISP or system administrator.  If you are using Web-based email,
!      you can't use the Spambayes POP3 proxy (sorry!).  In your email
       client's configuration, where you would normally put your POP3 server
       address, you should now put the address of the machine running
!      Spambayes.""",
       SERVER, DO_NOT_RESTORE),
  
      ("listen_ports", "SpamBayes Ports", (),
       """Each POP3 server that is being monitored must be assigned to a
!      'port' in the Spambayes POP3 proxy.  This port must be different for
       each monitored server, and there must be a port for
       each monitored server.  Again, you need to configure your email
--- 658,670 ----
       these server names from your existing email configuration, or from
       your ISP or system administrator.  If you are using Web-based email,
!      you can't use the SpamBayes POP3 proxy (sorry!).  In your email
       client's configuration, where you would normally put your POP3 server
       address, you should now put the address of the machine running
!      SpamBayes.""",
       SERVER, DO_NOT_RESTORE),
  
      ("listen_ports", "SpamBayes Ports", (),
       """Each POP3 server that is being monitored must be assigned to a
!      'port' in the SpamBayes POP3 proxy.  This port must be different for
       each monitored server, and there must be a port for
       each monitored server.  Again, you need to configure your email
***************
*** 688,709 ****
       PATH, DO_NOT_RESTORE),
  
!     ("notate_to", "Notate to", False,
!      """Some email clients (Outlook Express, for example) can only
!      set up filtering rules on a limited set of headers.  These
!      clients cannot test for the existence/value of an arbitrary
!      header and filter mail based on that information.  To
!      accomodate these kind of mail clients, the Notate To: can be
!      checked, which will add "spam", "ham", or "unsure" to the
!      recipient list.  A filter rule can then use this to see if
!      one of these words (followed by a comma) is in the recipient
!      list, and route the mail to an appropriate folder, or take
!      whatever other action is supported and appropriate for the
!      mail classification.""",
!      BOOLEAN, RESTORE),
  
!     ("notate_subject", "Classify in subject: header", False,
       """This option will add the same information as 'Notate To',
       but to the start of the mail subject line.""",
!      BOOLEAN, RESTORE),
  
      ("cache_messages", "Cache messages", True,
--- 693,716 ----
       PATH, DO_NOT_RESTORE),
  
!     ("notate_to", "Notate to", (),
!      """Some email clients (Outlook Express, for example) can only set up
!      filtering rules on a limited set of headers.  These clients cannot
!      test for the existence/value of an arbitrary header and filter mail
!      based on that information.  To accomodate these kind of mail clients,
!      you can add "spam", "ham", or "unsure" to the recipient list.  A
!      filter rule can then use this to see if one of these words (followed
!      by a comma) is in the recipient list, and route the mail to an
!      appropriate folder, or take whatever other action is supported and
!      appropriate for the mail classification.
  
!      As it interferes with replying, you may only wish to do this for
!      spam messages; simply tick the boxes of the classifications take
!      should be identified in this fashion.""",
!      ("ham", "spam", "unsure"), RESTORE),
! 
!     ("notate_subject", "Classify in subject: header", (),
       """This option will add the same information as 'Notate To',
       but to the start of the mail subject line.""",
!      ("ham", "spam", "unsure"), RESTORE),
  
      ("cache_messages", "Cache messages", True,
***************
*** 727,751 ****
       BOOLEAN, RESTORE),
  
!     ("add_mailid_to", "Add unique spambayes id", (),
!      """If you wish to be able to find a specific message (via the 'find'
!      box on the home page), or use the SMTP proxy to
!      train, you will need to know the unique id of each message.  If your
!      mailer allows you to view all message headers, and includes all these
!      headers in forwarded/bounced mail, then the best place for this id
!      is in the headers of incoming mail.  Unfortunately, some mail clients
!      do not offer these capabilities.  For these clients, you will need to
!      have the id added to the body of the message.  If you are not sure,
!      the safest option is to use both.""",
!      ("header", "body"), True),
! 
!     ("strip_incoming_mailids", "Strip incoming spambayes ids", False,
!      """If you receive messages from other spambayes users, you might
!      find that incoming mail (generally replies) already has an id,
!      particularly if they have set the id to appear in the body (see
!      above).  This might confuse the SMTP proxy when it tries to identify
!      the message to train, and make it difficult for you to identify
!      the correct id to find a message.  This option strips all spambayes
!      ids from incoming mail.""",
!      BOOLEAN, RESTORE),
    ),
  
--- 734,744 ----
       BOOLEAN, RESTORE),
  
!     ("no_cache_large_messages", "Maximum size of cached messages", 0,
!      """Where message caching is enabled, this option suppresses caching
!      of messages which are larger than this value.  If you receive a lot
!      of messages that include large attachments (and are correctly
!      classified), you may not wish to cache these.  If you set this to
!      zero (0), then this option will have no effect.""",
!      INTEGER, RESTORE),
    ),
  
***************
*** 792,795 ****
--- 785,802 ----
       spam at nowhere.nothing.""",
       EMAIL_ADDRESS, RESTORE),
+ 
+     ("use_cached_message", "Lookup message in cache", False,
+      """If this option is set, then the smtpproxy will attempt to
+      look up the messages sent to it (for training) in the POP3 proxy cache
+      or IMAP filter folders, and use that message as the training data.
+      This avoids any problems where your mail client might change the
+      message when forwarding, contaminating your training data.  If you can
+      be sure that this won't occur, then the id-lookup can be avoided.
+ 
+      Note that Outlook Express users cannot use the lookup option (because
+      of the way messages are forwarded), and so if they wish to use the
+      SMTP proxy they must enable this option (but as messages are altered,
+      may not get the best results, and this is not recommended).""",
+      BOOLEAN, RESTORE),
    ),
  
***************
*** 975,978 ****
  if not optionsPathname:
      optionsPathname = os.path.abspath('bayescustomize.ini')
- # Set verbosity of this options instance to an option value!
- options.verbose = options["globals", "verbose"]
--- 982,983 ----

Index: OptionsClass.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/OptionsClass.py,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** OptionsClass.py	21 Aug 2003 13:06:59 -0000	1.10
--- OptionsClass.py	25 Aug 2003 09:00:36 -0000	1.11
***************
*** 587,590 ****
--- 587,591 ----
              sect, opt = self.conversion_table[sect, opt]
          return self._options[sect, opt]
+ 
      def get(self, sect, opt):
          '''Get an option value.'''
***************
*** 695,699 ****
  FILE = r"[\S]+"
  FILE_WITH_PATH = PATH
! IP_LIST = r"\*|localhost|((\*|[01]?\d\d?|2[04]\d|25[0-5])\.(\*|[01]?\d\d?|2[04]\d|25[0-5])\.(\*|[01]?\d\d?|2[04]\d|25[0-5])\.(\*|[01]?\d\d?|2[04]\d|25[0-5]),?)+"
  # IMAP seems to allow any character at all in a folder name,
  # but we want to use the comma as a delimiter for lists, so
--- 696,702 ----
  FILE = r"[\S]+"
  FILE_WITH_PATH = PATH
! IP_LIST = r"\*|localhost|((\*|[01]?\d\d?|2[04]\d|25[0-5])\.(\*|[01]?\d" \
!           r"\d?|2[04]\d|25[0-5])\.(\*|[01]?\d\d?|2[04]\d|25[0-5])\.(\*" \
!           r"|[01]?\d\d?|2[04]\d|25[0-5]),?)+"
  # IMAP seems to allow any character at all in a folder name,
  # but we want to use the comma as a delimiter for lists, so

Index: hammie.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/hammie.py,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** hammie.py	29 Jan 2003 03:23:34 -0000	1.5
--- hammie.py	25 Aug 2003 09:00:36 -0000	1.6
***************
*** 247,251 ****
  
  
! def open(filename, usedb=True, mode='r'):
      """Open a file, returning a Hammie instance.
  
--- 247,251 ----
  
  
! def open(filename, useDB=True, mode='r'):
      """Open a file, returning a Hammie instance.
  
***************
*** 254,265 ****
      used as the flag to open DBDict objects.  'c' for read-write (create
      if needed), 'r' for read-only, 'w' for read-write.
- 
      """
! 
!     if usedb:
!         b = storage.DBDictClassifier(filename, mode)
!     else:
!         b = storage.PickledClassifier(filename)
!     return Hammie(b)
  
  
--- 254,259 ----
      used as the flag to open DBDict objects.  'c' for read-write (create
      if needed), 'r' for read-only, 'w' for read-write.
      """
!     return Hammie(storage.open_storage((filename, mode), useDB)
  
  

Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** message.py	18 Jul 2003 21:19:58 -0000	1.31
--- message.py	25 Aug 2003 09:00:37 -0000	1.32
***************
*** 71,75 ****
      """
  
! # This module is part of the spambayes project, which is Copyright 2002
  # The Python Software Foundation and is covered by the Python Software
  # Foundation license.
--- 71,75 ----
      """
  
! # This module is part of the spambayes project, which is Copyright 2002-3
  # The Python Software Foundation and is covered by the Python Software
  # Foundation license.
***************
*** 111,131 ****
          self.mode = mode
          self.db_name = db_name
!         self.dbm = dbmstorage.open(self.db_name, self.mode)
!         self.db = shelve.Shelf(self.dbm)
  
      def store(self):
!         self.db.sync()
  
      def _getState(self, msg):
!         try:
!             (msg.c, msg.t) = self.db[msg.getId()]
!         except KeyError:
!             pass
  
      def _setState(self, msg):
!         self.db[msg.getId()] = (msg.c, msg.t)
  
      def _delState(self, msg):
!         del self.db[msg.getId()]
  
  # This should come from a Mark Hammond idea of a master db
--- 111,144 ----
          self.mode = mode
          self.db_name = db_name
!         try:
!             self.dbm = dbmstorage.open(self.db_name, self.mode)
!         except dbmstorage.error:
!             # This probably means that we don't have a dbm module
!             # available.  Print out a warning, and continue on
!             # (not persisting any of this data).
!             if options["globals", "verbose"]:
!                 print "Warning: no dbm modules available for MessageInfoDB"
!             self.dbm = self.db = None
!         if self.dbm:
!             self.db = shelve.Shelf(self.dbm)
  
      def store(self):
!         if self.db:
!             self.db.sync()
  
      def _getState(self, msg):
!         if self.db:
!             try:
!                 (msg.c, msg.t) = self.db[msg.getId()]
!             except KeyError:
!                 pass
  
      def _setState(self, msg):
!         if self.db:
!             self.db[msg.getId()] = (msg.c, msg.t)
  
      def _delState(self, msg):
!         if self.db:
!             del self.db[msg.getId()]
  
  # This should come from a Mark Hammond idea of a master db
***************
*** 205,214 ****
      def GetClassification(self):
          if self.c == 's':
!             return options['Hammie','header_spam_string']
!         if self.c == 'h':
!             return options['Hammie','header_ham_string']
!         if self.c == 'u':
!             return options['Hammie','header_unsure_string']
! 
          return None
  
--- 218,226 ----
      def GetClassification(self):
          if self.c == 's':
!             return options['Headers','header_spam_string']
!         elif self.c == 'h':
!             return options['Headers','header_ham_string']
!         elif self.c == 'u':
!             return options['Headers','header_unsure_string']
          return None
  
***************
*** 217,230 ****
          # may change, which would really screw this database up
  
!         if cls == options['Hammie','header_spam_string']:
              self.c = 's'
!         elif cls == options['Hammie','header_ham_string']:
              self.c = 'h'
!         elif cls == options['Hammie','header_unsure_string']:
              self.c = 'u'
          else:
              raise ValueError, \
                    "Classification must match header strings in options"
- 
          self.modified()
  
--- 229,241 ----
          # may change, which would really screw this database up
  
!         if cls == options['Headers','header_spam_string']:
              self.c = 's'
!         elif cls == options['Headers','header_ham_string']:
              self.c = 'h'
!         elif cls == options['Headers','header_unsure_string']:
              self.c = 'u'
          else:
              raise ValueError, \
                    "Classification must match header strings in options"
          self.modified()
  
***************
*** 311,317 ****
          # allow filtering in 'stripped down' mailers like Outlook Express,
          # so for the moment, they stay in.
!         if options["pop3proxy", "notate_to"] \
!            and disposition == options["Headers", "header_spam_string"]:
!             # add 'spam' as recip only if spam
              try:
                  self.replace_header("To", "%s,%s" % (disposition,
--- 322,326 ----
          # allow filtering in 'stripped down' mailers like Outlook Express,
          # so for the moment, they stay in.
!         if disposition in options["pop3proxy", "notate_to"]:
              try:
                  self.replace_header("To", "%s,%s" % (disposition,
***************
*** 320,326 ****
                  self["To"] = disposition
  
!         if options["pop3proxy", "notate_subject"] \
!            and disposition == options["Hammie", "header_spam_string"]:
!             # add 'spam' to subject if spam
              try:
                  self.replace_header("Subject", "%s,%s" % (disposition,
--- 329,333 ----
                  self["To"] = disposition
  
!         if disposition in options["pop3proxy", "notate_subject"]:
              try:
                  self.replace_header("Subject", "%s,%s" % (disposition,
***************
*** 329,349 ****
                  self["Subject"] = disposition
  
!         if "header" in options['pop3proxy','add_mailid_to']:
              self[options['pop3proxy','mailid_header_name']] = self.id
  
- # This won't work for now, because email.Message does not isolate message body
- # This is also not consistent with the function of this method...
- #        if options.pop3proxy_add_mailid_to.find("body") != -1:
- #            body = body[:len(body)-3] + \
- #                   options.pop3proxy_mailid_header_name + ": " \
- #                    + messageName + "\r\n.\r\n"
- 
      def delSBHeaders(self):
!         del self[options['Hammie','header_name']]
!         del self[options['pop3proxy','mailid_header_name']]
!         del self[options['Hammie','header_name'] + "-ID"]  # test mode header
!         del self[options['pop3proxy','prob_header_name']]
!         del self[options['pop3proxy','thermostat_header_name']]
!         del self[options['pop3proxy','evidence_header_name']]
  
  # These perform similar functions to email.message_from_string()
--- 336,351 ----
                  self["Subject"] = disposition
  
!         if options['Headers','add_unique_id']:
              self[options['pop3proxy','mailid_header_name']] = self.id
  
      def delSBHeaders(self):
!         del self[options['Headers','classification_header_name']]
!         del self[options['Headers','mailid_header_name']]
!         del self[options['Headers','classification_header_name'] + "-ID"]  # test mode header
!         del self[options['Headers','prob_header_name']]
!         del self[options['Headers','thermostat_header_name']]
!         del self[options['Headers','evidence_header_name']]
!         del self[options['Headers','score_header_name']]
!         del self[options['Headers','trained_header_name']]
  
  # These perform similar functions to email.message_from_string()

Index: storage.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/storage.py,v
retrieving revision 1.19
retrieving revision 1.20
diff -C2 -d -r1.19 -r1.20
*** storage.py	25 Aug 2003 08:48:56 -0000	1.19
--- storage.py	25 Aug 2003 09:00:37 -0000	1.20
***************
*** 607,610 ****
--- 607,612 ----
          klass = PickledClassifier
      try:
+         if isinstance(data_source_name), type(())):
+             return klass(*data_source_name)
          return klass(data_source_name)
      except dbmstorage.error, e:





More information about the Spambayes-checkins mailing list