[Spambayes] My adventures with Spambayes...
Don Chance
chance at stsci.edu
Tue Aug 12 16:04:27 EDT 2003
Hi,
Yesterday, after reading about spambayes on slashdot, I downloaded it
and start playing around with it. I have an IMAP account, so I tried
out imapfilter.py. The following is a record of the hacks I had to
apply to get it to work. I am posting this in hope this is will be of
some use to those noble folks who are developing this code.
The first problem I ran into was:
> python2.2 imapfilter.py -b
SpamBayes IMAP Filter Alpha1, version 0.01 (May 2003),
using SpamBayes IMAP Filter Web Interface Alpha1, version 0.01
and engine SpamBayes Beta2, version 0.2 (July 2003).
Traceback (most recent call last):
File "imapfilter.py", line 789, in ?
run()
File "imapfilter.py", line 740, in run
pwd = options["imap", "password"][0]
IndexError: tuple index out of range
After adding the "-p" option, I was able to set things up from the web
page.
Next, I tried training the filter, but kept getting:
imaplib.error: APPEND command error: BAD ['Invalid date-time in Append
command']
Added some print statements to see what was going on. After staring
at the time string that caused the problem for a long time, I finally
figured out the what was causing the error. The last part of the time
was "+000" instead of "+0000". Added the following code to workaround
the problem:
385,392d380
< msg_time_list = str(msg_time).replace('"', '').strip().split()
< if len(msg_time_list) > 2 and len(msg_time_list[2]) == 4:
< msg_time = '"' + string.join(msg_time_list) + '0"'
< if options["globals", "verbose"]:
< print "folder name:", self.folder.name
< print "flags: ", flags
< print "msg_time: ", msg_time
<
Next, I tried to classify my Inbox, but kept getting assertion errors
on "assert hamcount <= nham". Added some print statements to see what
was going on:
> python2.2 imapfilter.py -c -p -v
.
. (verbose output deleted)
.
hamcount: 25
nham: 24.0
Traceback (most recent call last):
File "imapfilter.py", line 786, in ?
run()
File "imapfilter.py", line 776, in run
imap_filter.Filter()
File "imapfilter.py", line 643, in Filter
self.unsure_folder)
File "imapfilter.py", line 565, in Filter
evidence=True)
File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 223, in chi2_spamprob
clues = self._getclues(wordstream)
File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 454, in _getclues
prob = self.probability(record)
File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 310, in probability
assert hamcount <= nham
AssertionError
Made the following modification to classifier.py to workaround the problem:
307,313c307
< if options["globals", "verbose"]:
< print "hamcount:", hamcount
< print "nham:", nham
< try:
< assert hamcount <= nham
< except:
< hamcount = nham
---
> assert hamcount <= nham
Tried to classify my Inbox again:
> python2.2 imapfilter.py -c -p -v
.
. (verbose output deleted)
.
Traceback (most recent call last):
File "imapfilter.py", line 786, in ?
run()
File "imapfilter.py", line 776, in run
imap_filter.Filter()
File "imapfilter.py", line 643, in Filter
self.unsure_folder)
File "imapfilter.py", line 579, in Filter
msg.Save()
File "imapfilter.py", line 369, in Save
data = _extract_fetch_data(response[1][0])
File "imapfilter.py", line 157, in _extract_fetch_data
mo = FETCH_RESPONSE_RE.match(response)
TypeError: expected string or buffer
Added code:
157,159d155
< if options["globals", "verbose"]:
< print "type(response):", type(response)
< print "response:", response
and repeated the command, but the error did not recur.
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Don Chance
Computer Sciences Corp.
Space Telescope Science Institute
3700 San Martin Dr.
Baltimore, MD 21218
410-338-4941
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
More information about the Spambayes
mailing list