At beginning of the year after 15 years using MS Mail/Outlook I ported my
E-Mail system to a server based setup using Dovecot IMAP and Roundcube
Webmail and/or Apple Mail client. I'm using Debian Stable on the server.
3 months ago I installed Spambayes on the server using MySQL as storage
backend. Here are some remarks and questions.
- The Debian stable package 1.0.4 is really old. Because of an error in
this version in the MySQL connect code SpamBayes/MySQL can not work.
- When I looked into Debian Unstable I discovered that there the same old
version of SpamBayes is in. Here for the first time the question came up:
Is SpamBayes on Debian still maintained? Is the maintainer here on the
list? The last entry in Debian changelog is a 'non-maintainer-upload'.
- After fixing the MySQL connect bug and running the initial training the
next problem came up:
"assert spamcount <= nspam, "Token seen in more spam than spam trained.""
- After that I decided to use newer code. Building a Debian package from
the 1.1a4 code was a thing of around 15 minutes. The code from that package
was running for around 8 weeks very stable.
- But last week the error came back (see below). It seems that this bug is
known to the developers? It is fixed in a newer version?
- Next problem: Building a Debian package from the SVN trunk fails while it
works fine in the same environment for 1.1a4. Does somebody understand what
is causing this error?
Thanks
Marko
MARKO VON OPPEN - TECHNISCHE SOFTWARE
Ringstr. 3, 71691 Freiberg am Neckar, Germany
fon +49 (0)7141 6080813, fax +49 (0)7141 6080814
e-mail marko(a)von-oppen.com web http://www.von-oppen.com/
=== error report 1 ===
marko@mvob:~$ sb_filter.py -f <spammail2.txt
Loading state from spambayes database
spambayes is an existing database, with 21827 spam and 26807 ham
Created new database in host:mysql.von-oppen.com user:*** pass:***
dbname=spambayes
Loading state from spambayes database
spambayes is an existing database, with 21827 spam and 26807 ham
Traceback (most recent call last):
File "/usr/bin/sb_filter.py", line 283, in <module>
main()
File "/usr/bin/sb_filter.py", line 274, in main
action(msg)
File "/usr/bin/sb_filter.py", line 192, in filter
return self.h.filter(msg)
File "/usr/lib/python2.5/site-packages/spambayes/hammie.py", line 155, in
filter
debug, train)
File "/usr/lib/python2.5/site-packages/spambayes/hammie.py", line 109, in
score_and_filter
prob, clues = self._scoremsg(msg, True)
File "/usr/lib/python2.5/site-packages/spambayes/hammie.py", line 38, in
_scoremsg
return self.bayes.spamprob(tokenize(msg), evidence)
File "/usr/lib/python2.5/site-packages/spambayes/classifier.py", line
196, in chi2_spamprob
clues = self._getclues(wordstream)
File "/usr/lib/python2.5/site-packages/spambayes/classifier.py", line
499, in _getclues
tup = self._worddistanceget(word)
File "/usr/lib/python2.5/site-packages/spambayes/classifier.py", line
514, in _worddistanceget
prob = self.probability(record)
File "/usr/lib/python2.5/site-packages/spambayes/classifier.py", line
317, in probability
assert spamcount <= nspam, "Token seen in more spam than spam trained."
AssertionError: Token seen in more spam than spam trained.
mysql> select * from bayes order by nspam desc limit 10;
+----------------------+-------+-------+
| word | nspam | nham |
+----------------------+-------+-------+
| header:From:1 | 21835 | 26555 |
| header:Subject:1 | 21830 | 26666 |
| saved state | 21827 | 26807 |
| header:To:1 | 21803 | 26422 |
| header:Return-Path:1 | 21794 | 19516 |
<snip>
=== Error while building Debian package from SVN sources ===
I tried to build the package using dpkg-buildpackage but the following is
the command that fails:
marko@mais:~/src/spambayes-1.1b1$ python setup.py install
--prefix=debian/spambayes/usr --no-compile
running install
Checking .pth file support in
debian/spambayes/usr/lib/python2.5/site-packages/
error: can't create or remove files in install directory
The following error occurred while trying to add or remove files in the
installation directory:
[Errno 2] No such file or directory:
'debian/spambayes/usr/lib/python2.5/site-packages/test-easy-install-12399.pth'
The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:
debian/spambayes/usr/lib/python2.5/site-packages/
This directory does not currently exist. Please create it and try again,
or
choose a different installation directory (using the -d or --install-dir
option).
Eric> File "/usr/lib/python2.6/asyncore.py", line 391, in __getattr__
Eric> return getattr(self.socket, attr)
Eric> AttributeError: '_socketobject' object has no attribute
Eric> 'ac_out_buffer'
Eric> This is Spambayes 1.1a4 installed on a fresh install of Ubuntu
Eric> 9.04.
Eric> Could someone explain to me what I broke?
You didn't break anything. The ground under your feet shifted. In Python
2.6 someone reworked the asyncore and asynchat modules in
backwards-incompatible ways. One of the things that was done was to remove
that ac_out_buffer attribute. Unfortunately, it looks like Dibbler.py uses
it. This whole episode caused quite a stir on python-dev when it came to
light. (Not enough, apparently, to restore the code in 2.7 though.)
I'm not certain of the correct fix, not being very familiar with the changes
in Python 2.6 or with the Dibbler module. Here are a couple things you
might try:
* Install Python 2.5 on your system and use that instead of Python 2.6.
* Modify your copy of Dibbler.py to remove the (sole) reference to
self.ac_out_buffer. Change:
while (self.producer_fifo or self.ac_out_buffer) and not self._closed:
to
acob = (hasattr(self, "ac_out_buffer")
and self.ac_out_buffer
or True)
while (self.producer_fifo or acob) and not self._closed:
* Staring at the thread on python-dev suggests that this might work as
well:
while self.writable() and not self._closed:
I'm pretty sure switching to Python 2.5 will work, though that may cause
your system some consternation if you have other packages which require 2.6
and you can't install both of them at the same time (I suspect that will be
ok). I'm a bit less sure about the second and third changes, but I think
they should work. Clearly the third is clearer and likely more correct as
well.
Full details on the change from python-dev:
http://news.gmane.org/find-root.php?message_id=%3c48EE2C77.1050809%40pallad…
Richie, care to chime in?
--
Skip Montanaro - skip(a)pobox.com - http://www.smontanaro.net/
when i wake up with a heart rate below 40, i head right for the espresso
machine. -- chaos @ forums.usms.org