[Spambayes] __del__ in DBDictClassifier?

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Tue Mar 25 07:46:45 EST 2003


3/24/2003 10:22:24 PM, Skip Montanaro <skip at pobox.com> wrote:

>You're suggesting that there's a good chance a DBDictClassifier instance
>will be involved in a cycle?  Looking at the code briefly I didn't see an
>instance attributes which looked like they would refer to other objects
>which would (possibly indirectly) refer back to the instance.  It's a common
>Python idiom to call an object's close() method in its __del__ method. 

Quoting your mail of 11/14/2002:

From: Skip Montanaro <skip at pobox.com>
Date: Thu, 14 Nov 2002 10:49:28 -0600
To: spambayes at python.org
Subject: [Spambayes] read-only DBDict in hammie?

I'd like to share the anydbm file between several accounts on my machine.
Before I fiddle hammie.py so it opens the file in read-only mode, is there
any reason when classifying (not training) it actually needs to update the
file?  There's a __del__ method in PersistentBayes which does this:

    def __del__(self):
        #super.__del__(self)
        self.save_state()

    def save_state(self):
        self.wordinfo[self.statekey] = (self.nham, self.nspam)

When classifying there's no reason that nham or nspam would change, right?

Skip

Quoting an exchange between Neale and Richie dated 11/18/2002:

From: Richie Hindle <richie at entrian.com>
To: Neale Pickett <neale at woozle.org>
Subject: Re: [Spambayes] Hammiefilter doesn't write out the pickle
Date: Mon, 18 Nov 2002 18:02:07 +0000
Cc: spambayes at python.org

Hi Neale,

> Neale thinks this is the right way to do it.  If the Bayes.* classes
> write out their state on destruction, we can treat them all the same.
> That's easy enough, just have them call self.store() in the __del__
> method.

Richie thinks this is a bad move.  Here's a minor rant I sent to Tim Stone
when he did exactly this in his Bayes module:

--------------------------------------------------------------------------

PersistentBayes.__del__() calls store() - this seems like a bad thing for
three reasons.  One is that I might not want to save my changes to the
database - pop3proxy has an explicit "Save & Shutdown" and "Shutdown"
buttons to give the user control over whether the database is saved or not
(to let you do speculative training and discard the results, for instance).
[This is the least important of the three reasons.  Four, four reasons!]
Also, the pop3proxy self-test uses an in-memory bayes instance that it
never wants to write to disk.  Secondly, it's unpredictable when __del__
will be called, or even *whether* it will be called - this:

class A:
    def __del__(self):
        print "A.__del__"

class B:
    def __del__(self):
        print "B.__del__"

a = A()
b = B()
a.b = b
b.a = a
print "Exiting..."

won't call either __del__ method in the current CPython implementation.

Thirdly, if users of PersistentBayes explicitly call store() - which seems
like the right thing to do - the database will be written out twice.  [And
that can take *a long time*.]

[snip]

I've found another reason why PersistentBayes.__del__() is a bad thing -
self.db_name isn't set in the case where a PickledBayes is created using a
filename that doesn't exist (which is done by the pop3proxy self-test) -
that was leading to exceptions being throw from __del__, which is a
notoriously hard problem to track down.

--------------------------------------------------------------------------

I'd much rather have an explicit store() method and document the fact that
storage may be pre-empted by certain implementations.  Relying on __del__
is nasty.

--
Richie Hindle
richie at entrian.com



As you can tell, I had coded the __del__ originally, and it was removed for 
the objections that you and Richie raised.

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org

There are 10 kinds of people in the world:
  those who understand binary,
  and those who don't.





More information about the Spambayes mailing list