[spambayes-dev] musings on latest enhancement

bill parducci bill at parducci.net
Tue Jun 17 09:11:05 EDT 2003


i was browsing through the notes on the latest updates in CVS and came 
across this, which gave me pause:

'Nonsense' HTML tags are stripped rather than replaced with a space
(e.g. Wr<!$FS|i|R3$s80sA >inkle Reduc<!$FS|i|R3$s80sA >tion becomes
"Wrinkle" and "Reduction" rather than "Wr", "inkle", "Reduc" and
"tion").

does this mean that <stuff> <like> <this> will be igonored? i wonder if 
it wouldn't be of value to treat the 'nonsense tags' as a tokens (e.g. 
append the list of tokens to the end of the text being scored) in 
addition to 'removing' them?

b




More information about the spambayes-dev mailing list