[Spambayes] An interesting example of bad correlation
Tim Peters
tim.one@comcast.net
Mon Oct 28 06:09:53 2002
I just got two copies of this spam from python.org:
"""
Olá me chamo Marquinho. Acabei de lançar um site na WEB que fala sobre o
povo brasileiro e meu projeto... Lá você vai ver minhas fotos. Você pode
divulgar o potencial de sua cidade. Além disso você pode concorrer a uma web
cam. dia 27 de dezembro.
Visite! e vote no meu site! Preciso de apoio...
http://www.nossobrasil.kit.net
Se não quiser mais receber nossa informação favor somente responda.
NossoBrasil.kit.net
NossoBrasil.kit.net
"""
One of them showed up in my "I'm sure it's spam" folder, with a score of
0.96. The other showed up in my "I'm confused" folder, with a score of
0.75. What's the difference? The former was addressed to
webmaster@python.org, and the latter to help@python.org, and the latter is a
(privately archived) mailing list so Mailman put its fingers on it. Despite
that I *thought* I was ignoring all Mailman headers, I was <wink>. But it
turns out Mailman does other stuff that reflects in the headers, adding this
stuff that didn't exist in the copy I got via webmaster:
'header:Errors-to:1' 0.045086
'subject:Python' 0.0644291
'subject:] ' 0.0772537
'subject:[' 0.147731
'subject:Help' 0.270936
'subject:-' 0.286281
The original didn't have an Errors-to header. The last 5(!) are due to the
[Python-Help]
inserted into the Subject line.
I believe spam that isn't caught by python.org, and comes thru on a mailing
list, is my biggest source of Unsure msgs.