[Spambayes] mboxtrain croaks on spam mbox file

Andrew A. Raines aaraines at pobox.com
Thu Sep 18 12:50:31 EDT 2003

Using spambayes-1.0a5, I get this error:

aar at packer:mboxes(510)$ ~/src/spambayes/mboxtrain.py -d ~/.hammiedb -s spam-archive-1
Training spam (spam-archive-1):
  Reading as Unix mbox
Traceback (most recent call last):
  File "/home/aar/src/spambayes/mboxtrain.py", line 304, in ?
  File "/home/aar/src/spambayes/mboxtrain.py", line 296, in main
    train(h, s, True, force, trainnew, removetrained)
  File "/home/aar/src/spambayes/mboxtrain.py", line 221, in train
    mbox_train(h, path, is_spam, force)
  File "/home/aar/src/spambayes/mboxtrain.py", line 155, in mbox_train
    if msg_train(h, msg, is_spam, force):
  File "/home/aar/src/spambayes/mboxtrain.py", line 83, in msg_train
    h.train(msg, is_spam)
  File "/export/home/aar/src/spambayes-1.0a5/spambayes/hammie.py", line 150, in train
    self.bayes.learn(tokenize(msg), is_spam)
  File "/export/home/aar/src/spambayes-1.0a5/spambayes/classifier.py", line 276, in learn
    self._add_msg(wordstream, is_spam)
  File "/export/home/aar/src/spambayes-1.0a5/spambayes/classifier.py", line 401, in _add_msg
    for word in Set(wordstream):
  File "/usr/lib/python2.3/sets.py", line 399, in __init__
  File "/usr/lib/python2.3/sets.py", line 353, in _update
    for element in iterable:
  File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 1082, in tokenize
    for tok in self.tokenize_headers(msg):
  File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 1093, in tokenize_headers
    for w in crack_content_xyz(x):
  File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 806, in crack_content_xyz
    fname = msg.get_filename()
  File "/usr/lib/python2.3/email/Message.py", line 711, in get_filename
    return unicode(newvalue[2], newvalue[0])
TypeError: unicode() argument 2 must be string, not None

Any idea what the email module is actually choking on?  There
are 2,110 messages in spam-archive-1 and this error pops up
around spam number 1,800, judging from the running tally.



More information about the Spambayes mailing list