[Spambayes] RE: Trapping Spam messages that contain images...
tameyer at ihug.co.nz
Tue Oct 19 06:06:43 CEST 2004
> However these days I am receiving a new kind of Spam that sneaks
> through my defences and Spambayes cannot trap.
Try enabling some of the experimental options. In particular, try:
To try these with the Outlook plug-in, open (or create) the file
default_bayes_customize.ini in your data directory, and add the option(s),
> Anyhow, I had an idea to trap these type of messages that I
> thought I might put out for discussion. [...] Basically
> it would involve some additional functionality allow OCR
> processing of images that are referenced on emails.
It has been a long time since I've done any OCR - is it really fast and
accurate enough to be useful in situations like this? We'd also (probably)
need to use an open-source OCR library (rather than write our own), which
adds packaging complications.
It's possible it would help, but I suspect that it would be very expensive
for little gain. Feel free to add it to the wiki http://entrian.com/sbwiki,
where there are other ideas to try out.
I get hardly any false negatives/unsures that are mostly images. If I ever
do (and so have a testing corpus), then I think it would be interesting to
try a simpler scheme, where tokens are generated based on simple features
(perhaps Haar-like features) of the image, and the classifier uses those as
it would like. The theory would be that it could pick up some features
common to good/bad images without looking for things like text.
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.
More information about the Spambayes