[Spambayes] Missing HTML payload
tim.one at comcast.net
Mon Mar 3 20:54:43 EST 2003
> I finally worked out where my missing "url:" tokens got to. However,
> once that is corrected, the same problem remains - no tokens extracted
> from the HTML body, *except* URL tokens, appear.
Sorry this was so painful! URL extraction occurs after the body has been
lower-cased, and after uuencoded-section removal, but before anything else
is done with the body. In particular, URL extraction is done before style
sheet and comment removal. That's why you saw url: tokens despite that the
comment construct was unclosed and comment-removal nuked the body. The
one-liner change in my last email should repair the problem.
More information about the Spambayes