Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase)

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Tue Apr 8 23:06:33 EDT 2003


4/8/2003 7:53:34 PM, "Mark Hammond" <mhammond at skippinet.com.au> wrote:

>It looks quite good, and is the basis of something Outlook could use.

This is good news to me <grin>.

>
>Some first thoughts after a quick look:
>
>* It would be cool if we could store the database in the same file as the
>word database.  bsddb supports this, and it seems to make a whole lot of
>sense.  Once file for all databases we come up with.  Apps could even add
>their own specific databases to this file.

Sounds very good.  I don't know how to do this.  Also, the msginfo database 
name is hard coded at this point, clearly not desirable.

>
>* setIdFromPayload(), addSBHeaders() and delSBHeaders() look suspect for the
>base class.  If you intend splitting it later, you should consider doing it
>earlier - it will force you to face certain decisions.

Where would you suggest we put these?  As functions somewhere?

>
>* The distinction between "set" and "change" will escape most people, and
>doesn't seem to serve much purpose except forcing people to call "change".
>Indeed, maybe "set" should check if self.id is already set, and if so,
>remove that ID from the database, or assert if that ID is still there, or
>some such.

Tony and I have struggled with this a bit.  There are too many id setters for 
my taste.  This is a typical key change kind of problem.  I'm kinda not sure 
it really matters much.

>
>* should copy() do something with the id, such as reset it?

Presumably, the message that is being copied into already has an unique id 
set, or will at some point.  This is more like a clone operation, an 
adaptation for the imap filter, which cannot simply modify a message.  It must 
create a new message, with a new id, and store it, then delete the old one.  A 
copy (clone, whatever) operation facilitates that process.

>
>* modified() is probably a bad name for that you are asking.  It seems the
>method means "HaveID()".  Oh - I think you are using it as an "event"?  Not
>sure.

Yes, it's an event.  It causes the object to be persisted.  Ugly, but I kinda 
lifted the code from the dbdictclassifier.  I could use some help here too.

>
>* All the "isCls" and "clsfy" methods, and training versions are suspect.
>If they are implementation specific, put an underscore, but to me they look
>like noise.  Why not just:
>
>  GetSpamClassification(self):
>    return None, True or False
>  RememberSpamClassification(self, isSpam):
>    void
>  GetSpamTrained(self):
>    return None, True or False
>  RememberSpamTrained(self, isSpam):
>    void
>
>You have about 80 lines expressing what I believe can be done in 10 (or so
><wink>)

I'm not wedded to this portion of the interface by any means, but...

Here we might have a bit of a problem.  The notesfilter really does have to 
know if a message has ever been classified, not just whether or not it has 
been classified as spam.  This is because there is no way to know if you've 
ever looked at a message or not.  Each time you run the filter, you look at 
every single stinking message in the entire notes database.  So... 
classification has to be more than a binary flag.  I gotta know if it's spam, 
ham, unsure, or never classified.  I'll grant you that perhaps that should go 
in a notesmessage subclass, but then persistence gets to be a problem...

>
>If we can agree on most of this, I may even help ;)

Your help would be MOST welcome.

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org

There are 10 kinds of people in the world:
  those who understand binary,
  and those who don't.





More information about the Spambayes mailing list