[spambayes-bugs] [ spambayes-Bugs-1163862 ] spambayes *very* slow

SourceForge.net noreply at sourceforge.net
Wed Mar 16 23:38:53 CET 2005


Bugs item #1163862, was opened at 2005-03-16 05:38
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1163862&group_id=61702

Category: pop3proxy
Group: Source code - CVS
Status: Open
Resolution: None
Priority: 5
Submitted By: Michal Vitecek (fufsource)
Assigned to: Nobody/Anonymous (nobody)
Summary: spambayes *very* slow

Initial Comment:
hello,

recently i upgraded spambayes 1.0a07 to 1.0.3 and later
to cvs version (1.1a0 as of 2005-03-15). unfortunately
both 1.0.3 and 1.1a0 versions are very slow (my
estimate is 20x).

i'm running the sb_server.py script with cPickle
database storage. i'm also constantly uploading
messages trained as spam/ham via sb_upload.py.
currently it's trained on 835 spam and 293 ham messages.

when i was running 1.0a07 (on bsddb database storage),
mail retrieval was blazingly fast, with both the 1.0.3
(also running on bsddb) and the cvs version a single
mail takes about 1-2 secs to be retrieved.

i'll be happy to supply any additional data - i just
don't know which would be interesting for the developers.

thank you

----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2005-03-17 11:38

Message:
Logged In: YES 
user_id=552329

Fixed the CVS bug; thanks.

sb_dbexpimp.py converts between bsddb and pickle (and csv,
and any of the other storage types in spambayes/storage.py).

Using a pickle for the message info db will mean that it has
to write the whole thing to disk every time it's updated,
which includes any time that a message is classified or
trained.  I'll leave this open and play around with using
pickle and see what I can come up with.

----------------------------------------------------------------------

Comment By: Michal Vitecek (fufsource)
Date: 2005-03-17 02:31

Message:
Logged In: YES 
user_id=698198

[i had to downgrade to 1.0.3 because there were some
problems in the cvs version (self.gzipCache undefined in
spambayes/UserInterface.py, i don't use cache at all).]

so 1.0.3 running on bsddb runs fast for me. but when the
same code is using pickle version of hammie.db the
spambayes.messageinfo.db is constantly being rebuilded
causing the cpu usage to be 100% most of the time.

i have both the hammie.db and spambayes.messageinfo.db which
seem to cause this. shall i upload them? they are 2480916
bytes for hammie.db and 1678535 bytes for messageinfo.

[btw: some utility to convert bsddb<=>pickle storage would
be great for experimenting]

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2005-03-16 14:48

Message:
Logged In: YES 
user_id=552329

Can you check to see whether this is just the change from
bsddb to pickle?

----------------------------------------------------------------------

Comment By: Michal Vitecek (fufsource)
Date: 2005-03-16 06:52

Message:
Logged In: YES 
user_id=698198

some additions to the above (sorry for not mentioning them
earlier):
this is on linux, kernel 2.4.26, python 2.3.5, db-4.1.25-NC,
pybsddb3-4.3.0

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1163862&group_id=61702


More information about the Spambayes-bugs mailing list