[spambayes-dev] musings on latest enhancement
bill parducci
bill at parducci.net
Tue Jun 17 09:11:05 EDT 2003
i was browsing through the notes on the latest updates in CVS and came
across this, which gave me pause:
'Nonsense' HTML tags are stripped rather than replaced with a space
(e.g. Wr<!$FS|i|R3$s80sA >inkle Reduc<!$FS|i|R3$s80sA >tion becomes
"Wrinkle" and "Reduction" rather than "Wr", "inkle", "Reduc" and
"tion").
does this mean that <stuff> <like> <this> will be igonored? i wonder if
it wouldn't be of value to treat the 'nonsense tags' as a tokens (e.g.
append the list of tokens to the end of the text being scored) in
addition to 'removing' them?
b
More information about the spambayes-dev
mailing list