[spambayes-bugs] [ spambayes-Patches-858564 ] Save last set and use times in database

SourceForge.net noreply at sourceforge.net
Thu Dec 11 16:58:18 EST 2003


Patches item #858564, was opened at 2003-12-11 15:58
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=858564&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Tony Meyer (anadelonbrin)
Summary: Save last set and use times in database

Initial Comment:
The attached patch adds a new database format to the 
database (that is, non-pickle) files: PICKLE_DATE_VERSION.  
In addition to spamcount and hamcount, it adds settime and 
usetime fields.

It seems to work for me, but I'm uploading a patch instead 
of checking something in for several reasons:

1. I don't know if this is the correct way to do it (probably 
not, since it can't be used by the picklefile database).

2. Applications which sidestep the usual mechanisms for 
reading the database will need to be fixed (part of the 
uploaded context diff is a change to contrib/spamcounts.py 
which demonstrates what such applications need to do).  
spamcounts.py is the only one I've modified so far.  I'm not 
sure if there are others, and if so, what they are.  Also, 
there may be other applications which use the regular 
database access mechanisms which will need to be changed 
to sidestep them so as not to corrupt the usetime fields.

3. It's not obvious there's a huge demand for this, but it 
should be helpful for people experimenting with token aging.

4. The database balloons.  The settime and usetime fields 
are datetime objects.

5. It's bound to be slower.  The database is always opened 
for writing.

6. Because of #5, you will have to be careful to always lock 
the database, even when using nominally read-only 
applications like sb_filter.py.

That said, after running with such a database for a little 
while (no more than 30 minutes or so), it seems to be doing 
the right thing (and is not yet corrupt :-):

>>> len([k for k in db if len(db[k]) == 4 and db[k][3] > 
db[k][2]])
896
>>> len(db)
14605
... time passes ...
>>> len([k for k in db if len(db[k]) == 4 and db[k][3] > 
db[k][2]])
965
>>> len(db)
14605
... more time passes ...
>>> len([k for k in db if len(db[k]) == 4 and db[k][3] > 
db[k][2]])
985
>>> len(db)
14605

Assigning to Tony since his SF id is at the top of the list. ;-)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=858564&group_id=61702



More information about the Spambayes-bugs mailing list