[Spambayes] mboxtrain croaks on spam mbox file
Andrew A. Raines
aaraines at pobox.com
Thu Sep 18 12:50:31 EDT 2003
Using spambayes-1.0a5, I get this error:
-
aar at packer:mboxes(510)$ ~/src/spambayes/mboxtrain.py -d ~/.hammiedb -s spam-archive-1
Training spam (spam-archive-1):
Reading as Unix mbox
Traceback (most recent call last):
File "/home/aar/src/spambayes/mboxtrain.py", line 304, in ?
main()
File "/home/aar/src/spambayes/mboxtrain.py", line 296, in main
train(h, s, True, force, trainnew, removetrained)
File "/home/aar/src/spambayes/mboxtrain.py", line 221, in train
mbox_train(h, path, is_spam, force)
File "/home/aar/src/spambayes/mboxtrain.py", line 155, in mbox_train
if msg_train(h, msg, is_spam, force):
File "/home/aar/src/spambayes/mboxtrain.py", line 83, in msg_train
h.train(msg, is_spam)
File "/export/home/aar/src/spambayes-1.0a5/spambayes/hammie.py", line 150, in train
self.bayes.learn(tokenize(msg), is_spam)
File "/export/home/aar/src/spambayes-1.0a5/spambayes/classifier.py", line 276, in learn
self._add_msg(wordstream, is_spam)
File "/export/home/aar/src/spambayes-1.0a5/spambayes/classifier.py", line 401, in _add_msg
for word in Set(wordstream):
File "/usr/lib/python2.3/sets.py", line 399, in __init__
self._update(iterable)
File "/usr/lib/python2.3/sets.py", line 353, in _update
for element in iterable:
File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 1082, in tokenize
for tok in self.tokenize_headers(msg):
File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 1093, in tokenize_headers
for w in crack_content_xyz(x):
File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 806, in crack_content_xyz
fname = msg.get_filename()
File "/usr/lib/python2.3/email/Message.py", line 711, in get_filename
return unicode(newvalue[2], newvalue[0])
TypeError: unicode() argument 2 must be string, not None
-
Any idea what the email module is actually choking on? There
are 2,110 messages in spam-archive-1 and this error pops up
around spam number 1,800, judging from the running tally.
Thanks.
-Drew
More information about the Spambayes
mailing list