[spambayes-dev] musings on latest enhancement

Tim Peters tim.one at comcast.net
Tue Jun 17 12:33:57 EDT 2003


[bill parducci]
> i was browsing through the notes on the latest updates in CVS and came
> across this, which gave me pause:
>
> 'Nonsense' HTML tags are stripped rather than replaced with a space
> (e.g. Wr<!$FS|i|R3$s80sA >inkle Reduc<!$FS|i|R3$s80sA >tion becomes
> "Wrinkle" and "Reduction" rather than "Wr", "inkle", "Reduc" and
> "tion").
>
> does this mean that <stuff> <like> <this> will be igonored?

Yes, by spambayes.

> i wonder if it wouldn't be of value to treat the 'nonsense tags' as a
> tokens (e.g. append the list of tokens to the end of the text being
> scored) in addition to 'removing' them?

You can try it.  I doubt it will help; it will certainly bloat database size
due to creating more hapaxes in the presence of this junk.




More information about the spambayes-dev mailing list