Tim Peters tim.one at comcast.net
Mon Mar 3 20:54:43 EST 2003

[Mark Hammond]
> I finally worked out where my missing "url:" tokens got to.  However,
> once that is corrected, the same problem remains - no tokens extracted
> from the HTML body, *except* URL tokens, appear.

Sorry this was so painful!  URL extraction occurs after the body has been
lower-cased, and after uuencoded-section removal, but before anything else
is done with the body.  In particular, URL extraction is done before style
sheet and comment removal.   That's why you saw url: tokens despite that the
comment construct was unclosed and comment-removal nuked the body.  The
one-liner change in my last email should repair the problem.

