[spambayes-dev] spamBayes ideas

Mon Apr 5 20:17:31 EDT 2004

Hello,
First let me say thank you for all your hard work on this project – it is
fantastic! I have recommended it to many people who have found it to be
everything I claimed ;-)

I am a product planner for a very large software project so hopefully my
ideas aren’t to lame.

1.	I have noticed lately that may spammers are moving to add fake HTML tags
in the middle of the words to screw the parsers up, much in the same way
that people obfuscate their email addresses on web pages to beat the
spambots. (E.G. from a spam received today -
www.lif<kdhpzam>eisimpo</gortcxld>rtant.biz<br><br>). I was thinking if you
could database valid HTML tags (perhaps learned and pre-populated?) so that
new unknown tags would count as spam probability. This would primarily mean
inverting the way < > tags are handled compared to other words, that is
assuming spam, learning ham. In the above example <br> would be ham and the
others spam. You could even set a property file to allow x number of false
tags to score the whole email as spam. In the above example spam there were
11 fake tags.

2.	The last one is a bit fancy but here goes. On possible spam measure the
recovered vs. bad and look at the scores. With an algorithm you should be
able to auto adjust the thresholds – just a thought.

HTH,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20040405/2bed38cc/attachment.html