[Spambayes] Latest spammer trick stymied - QUESTION
bill parducci
bill at parducci.net
Mon Mar 31 17:15:02 EST 2003
T. Alexander Popiel wrote:
>>take the example: http://check.myspam.com/ad/junk?random=fsldkjflksj
> That example would yield the tokens:
>
> proto:http
> url:check
> url:myspam
> url:com
> url:ad
> url:junk
> url:random
> url:fsldkjflksj
<bayesian ignorance shields up>
doesn't the degree of granularity here dilute the information? in other
words, 'com' and 'junk' are extremely common, while 'myspam.com' less so
and 'check.myspam.com' completely unique. since neutral tokens are
ignored, words like these may not be considered, while the following
most likely would be considered:
> url:myspam.com
> url:check.myspam.com
> url:check.myspam.com/ad
> url:check.myspam.com/ad/junk
therefore, in the case of url parsing, it would seem that less
[granularity] is more [accuracy].
</shields>
b
More information about the Spambayes
mailing list