[Spambayes] how spambayes handles image-only spams

Tim Peters tim.one at comcast.net
Sun Sep 7 21:08:04 EDT 2003


[Sean True]
> HTML mail works just fine. I get TONS of html ham, and I know our
> customers do. Since they pay us to make sure that InBoxer/SpamAtBay

I bet Laura would tell you to stick to one name, for marketing clarity.

> work for them, and yell when it doesn't, I'm pretty sure the tokenizer
> work here was _well_ worth while. That said, it might be good
> to have a button for: "I hate html mail, tokenize the html mark up
> and let the chips fall where they may"

We used to have an option for that, very early on; I expect the code
implementing it went away before you saw this project; it would be easy to
add back, although split-on-whitespace seems a poor tokenization strategy
for encoded stuff (e.g.,

    bgcolor="#33cccc"><img

is a peculiar token, mixing the semantics from two very different conceptual
tags; and then because it's "too long" (> 12 characters), spambayes goes to
compress it into a synthesized "skip" token).  IOW, a crap job would be easy
to slam back in, but a good job harder -- I lack interest for either and
time for the latter.




More information about the Spambayes mailing list