[spambayes-bugs] [ spambayes-Patches-858564 ] Save last set and use
times in database
SourceForge.net
noreply at sourceforge.net
Thu Dec 11 16:58:18 EST 2003
Patches item #858564, was opened at 2003-12-11 15:58
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=858564&group_id=61702
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Tony Meyer (anadelonbrin)
Summary: Save last set and use times in database
Initial Comment:
The attached patch adds a new database format to the
database (that is, non-pickle) files: PICKLE_DATE_VERSION.
In addition to spamcount and hamcount, it adds settime and
usetime fields.
It seems to work for me, but I'm uploading a patch instead
of checking something in for several reasons:
1. I don't know if this is the correct way to do it (probably
not, since it can't be used by the picklefile database).
2. Applications which sidestep the usual mechanisms for
reading the database will need to be fixed (part of the
uploaded context diff is a change to contrib/spamcounts.py
which demonstrates what such applications need to do).
spamcounts.py is the only one I've modified so far. I'm not
sure if there are others, and if so, what they are. Also,
there may be other applications which use the regular
database access mechanisms which will need to be changed
to sidestep them so as not to corrupt the usetime fields.
3. It's not obvious there's a huge demand for this, but it
should be helpful for people experimenting with token aging.
4. The database balloons. The settime and usetime fields
are datetime objects.
5. It's bound to be slower. The database is always opened
for writing.
6. Because of #5, you will have to be careful to always lock
the database, even when using nominally read-only
applications like sb_filter.py.
That said, after running with such a database for a little
while (no more than 30 minutes or so), it seems to be doing
the right thing (and is not yet corrupt :-):
>>> len([k for k in db if len(db[k]) == 4 and db[k][3] >
db[k][2]])
896
>>> len(db)
14605
... time passes ...
>>> len([k for k in db if len(db[k]) == 4 and db[k][3] >
db[k][2]])
965
>>> len(db)
14605
... more time passes ...
>>> len([k for k in db if len(db[k]) == 4 and db[k][3] >
db[k][2]])
985
>>> len(db)
14605
Assigning to Tony since his SF id is at the top of the list. ;-)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=858564&group_id=61702
More information about the Spambayes-bugs
mailing list