[Spambayes] how spambayes handles image-only spams
Ryan Malayter
rmalayter at bai.org
Tue Sep 2 11:52:39 EDT 2003
From: Tim Peters [mailto:tim.one at comcast.net]
> "trouble" means what? That they're classified
> as ham, or that they're classified as unsure?
Mostly they are classified as unsure, but I recall a couple classified
as ham. I used to keep careful track of all this on a spreadsheet, but I
seem to have gotten lazy with that, so I'm not sure I have all the data.
> The other side to this is that *any* evidence of HTML
> is a strong spam indicator in most corpora... virtually
> nothing using HTML could avoid being classified as spam...
This doesn't seem right to me, at least on an intuitive level. We're an
Outlook 2003 shop, and we've used Windows Group Policies to force all
internal users to create HTML messages instead of Microsoft RTF format.
So a great big heaping pile of my non-spam corpus would be messages that
contain <P> <BR> and other "innocent" HTML tags. Shouldn't the
statistical nature of SpamBayes give these tokens something near 0.5 as
a score, since they appear frequently in both corpora?
> Work up a patch and see what happens!
I just started playing with Python, so it might be a while before I can
do that. I really like it at first glance, though.
Are there any good IDEs that support Python, with code highlighting?
Being a C++/VB guy as of late, is there a .NET-ified version of Python
that I can plug into the Visual Studio 2003 IDE? I've seen the stuff on
ActiveState's website, but it seems woefully out-of-date, since
everything is dated 2001.
More information about the Spambayes
mailing list