[Spambayes] Missing HTML payload

Mark Hammond mhammond at skippinet.com.au
Tue Mar 4 12:12:35 EST 2003


I wrote:

> I instrumented the "show clues" feature to show *all* message tokens found
> in the body.  As you can see at the very end, the entire body was
> stripped.

I finally worked out where my missing "url:" tokens got to.  However, once
that is corrected, the same problem remains - no tokens extracted from the
HTML body, *except* URL tokens, appear.

> I am guessing that we barf on:
>             <td><!--#rotato>
> a comment which is never closed.  Outlook actually shows this entire tag

Digging deeper, this seems to be true.

>>> from spambayes import tokenizer
>>> tokenizer.crack_html_comment("hi <!-- wow --> there")
('hi  there', [])
>>> tokenizer.crack_html_comment("hi <!-- wow> there")
('hi ', [])

Mark.




More information about the Spambayes mailing list