[Spambayes] Msg class broken?

T. Alexander Popiel popiel at wolfskeep.com
Thu Feb 6 06:34:06 EST 2003

I'm trying to do a bit more testing (*gasp*), but I'm having a bit
of difficulty: it seems that the tokenizer doesn't like being given
a simple string anymore, as is done in the Msg class in msgs.py.
If I'm reading things right, this breaks all of the automated testing
tools.  Have a traceback:

Traceback (most recent call last):
  File "testtools/Continuous.py", line 293, in ?
  File "testtools/Continuous.py", line 254, in main
    tests[j].predict([msg], isspam)
  File "testtools/Continuous.py", line 94, in predict
    prob = guess(example)
  File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 217, in chi2_spamprob
    clues = self._getclues(wordstream)
  File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 436, in _getclues
    for word in Set(wordstream):
  File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 374, in __init__
  File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 333, in _update
    for element in it:
  File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1052, in tokenize
    for tok in self.tokenize_headers(msg):
  File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1063, in tokenize_headers
    for w in crack_content_xyz(x):
  File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 791, in crack_content_xyz
    yield 'content-type:' + msg.get_content_type()
AttributeError: Message instance has no attribute 'get_content_type'

Please ignore the top three lines of the trace; I'm building my own
driver for testing with incremental training after each message.
(What I'm trying to do in the big picture is get graphs of how the
error rates drop off over time with various training modes.)

Anyway, it looks like either msgs.py needs to be updated to pass in
email.Message.Message objects, or tokenizer.py needs to relearn how
to accept raw strings.  Am I reading this right?  This seems odd
since tokenizer does seem to try to convert the string to a Message
via the auspices of mboxutils... help?

- Alex

More information about the Spambayes mailing list