[Spambayes] Training via web interface in 1.1a6 doesn't work?
Carl Colijn
c.colijn at twologs.com
Thu Oct 14 11:34:11 CEST 2010
Hi all,
I've used the ThunderBayes ThunderBird plugin with SpamBayes 1.0.4 in
the past, and made a copy of every mail I trained the filter with in a
Training-Spam and Training-Ham folder. This allowed me to get the
training back to where I left off after a re-install or database
corruption. I just had to go to the SpamBayes web interface, select a
Thunderbird training folder file and press the "train as xxx" button.
The original ThunderBayes extension doesn't work anymore with
ThunderBird 3.x, so I decided to use a plain SpamBayes installation
after installing ThunderBird 3.x (can't live without SpamBayes anymore
;) ). I set up SpamBayes 1.1a6 seperately and connected ThunderBird to
it. When I then started to let SpamBayes train on my training folders
it didn't work. No errors in the web interface, training seemed to go
OK (uploaded ok, Training... Saving... Done!) but the statistics on the
main page ("Total emails trained") didn't reflect the newly trained
mails (neither ham nor spam).
Searching a bit more I found the original ThunderBayes plugin (now
abandoned) had been continued as a ThunderBayes++ plugin - and it even
includes the latest SpamBayes version as well in stead of the ancient
1.0.4 :) So I've now set everything up again using the ThunderBayes
plugin (after first uninstalling the separate SpamBayes version), and
started training again hoping this would have fixed it. But it didn't:
I'm still stuck with the same situation - training seems to go OK but
the trained-on mails don't arrive in the database.
Does anyone have an idea what could be going wrong? I assume it's some
silly configuration issue, but I've already tweaked it for quite a few
hours now and can't get it right. I've attached a clean config file set
after training on 1 ham message for anyone willing to give it a go as well.
Some observations:
- I run Windows XP SP3 en-us with the SpamBayes 1.1a6 version shipped
with ThunderBayes++ - databases are of the pickle version
- My training databases contain +- 250 ham, +- 6000 spam
- When I start clean (close ThunderBird/SpamBayes, delete the cache &
training databases) it re-creates them OK when restarted again
- After a restart it claims there are 0 trained messages (of course)
- When I upload the Thunderbird ham training folder file it seems to
process it correctly but after it's done the counter still remains at "0
trained messages"
- hamme.db doesn't grow either (56 bytes after a clean database
recreation, still 56 bytes after training)
- There's no error in the log
- I've enabled caching messages (ThunderBayes by default has it off I
think), and the uploaded messages do get extracted as separate messages
in the cache - messageinfo.db indeed also grows
- "Review messages" sometimes shows the uploaded messages, but not
consistently - they did appear a few times after I tweaked and restarted
and such
- Copy/pasting a separate mail with headers and training on that has the
same effect
- When I let it train on my Spam folder (with 6000+ mails in it) it is
seriously busy - CPU at 100% for more than 10 minutes - so it must think
it's doing something
- Consecutively letting it train on the small Ham folder (250 messages)
now takes far more time - the 6000+ spam messages it processed earlier
must have influenced something
- When I look at the "More statistics" page it the uploaded messages
_do_ get reflected in the "Unsures trained as good" and "Unsures trained
as spam" statistics
- Training via the ThunderBayes buttons in ThunderBird _do_ raise the
"trained on" counters - what does it do that I cannot do?
- There are no SMTP proxy details info specified in the settings - I
assume ThunderBayes++ passes the ham/spam training via the web interface
as well?
- Starting from scratch again (delete db's, clear email cache) and
selecting "bsddb"as db type didn't change a thing
Here's the spambayes.ini file I use:
[Headers]
include_score:True
notate_subject:
[Storage]
persistent_use_database:pickle
persistent_storage_file:databases/hammie.db
cache_expiry_days:2
cache_messages:True
no_cache_bulk_ham:False
messageinfo_storage_file:databases/messageinfo.db
ham_cache:cache/ham
spam_cache:cache/spam
unknown_cache:cache/unsure
[html_ui]
default_spam_action:defer
display_score:True
[pop3proxy]
use_ssl:automatic
listen_ports:53100,53101,53102
remote_servers:xxx.xxx.com:110,xxx.xxx.com:995,xxx.xxx.nl:110
--
Kind regards,
Carl Colijn
TwoLogs - IT Services and Product Development
A natural choice!
http://www.twologs.com
TimeTraces: the powerful and versatile time registration system!
http://timetraces.twologs.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/spambayes/attachments/20101014/bc1a0e2a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: files.zip
Type: application/octet-stream
Size: 11175 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/spambayes/attachments/20101014/bc1a0e2a/attachment.obj>
More information about the SpamBayes
mailing list