[spambayes-dev] [ spambayes-Feature Requests-762783 ] TMDA capabilities

SourceForge.net noreply at sourceforge.net
Wed Jul 2 10:26:20 EDT 2003


Feature Requests item #762783, was opened at 2003-06-29 13:33
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=762783&group_id=61702

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Becker-Freyseng, Christoph (beyond-thoughts)
Assigned to: Nobody/Anonymous (nobody)
Summary: TMDA capabilities

Initial Comment:
I just switched from TMDA (http://tmda.net/) to Spambayes.
Watching emails that were neither in black- or
whitelist was to annoying. Of course those people get a
reply (please see at TMDA-Homepage to see how it works)
but about 50% don't understand the reply-mail. (I don't
know why -- the text is very clear!).
Spambayes doesn't queue emails that can't be classified
-- you receive them as "unsure".

I think combining both techniques would cut down wrong
positive and wrong negative and the "unsure" numbers.
Additionaly it might be useful for training having a
definite black- and whitelist.

So what are good points (files, classes, methods) to
add such a feature?
Is it dependent whether pop3proxy, hammie, ... is used?

Thank You,
   Christoph Becker-Freyseng


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2003-07-02 12:26

Message:
Logged In: YES 
user_id=31435

I expect it's clear by now that none of the people who have 
developed this project so far are keen to graft a TMDA 
scheme into it.  More debate about that should probably be 
directed to the spambayes-dev mailing list.

That shouldn't stop you from pursuing it, though!  spambayes-
dev would also be the right place to ask about the best 
places to hook into this code base.

About "it assumes spammers won't use Spambayes and a valid 
email address", the rub is that spammers often use a valid 
email address -- but not their own!  It's hell to be one of the 
unlucky people whose email address is forged by a spammer.  
Whether spam recipients flood such a person with rants or 
TMDA requests doesn't much matter, that email address 
becomes unusable due to sheer volume.

----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-02 08:20

Message:
Logged In: YES 
user_id=186848

(No filter will have probability of false positives of 0.00
but how good is "your filter" when looking through emails in
"Spam Folder". There's still human error and if the error of
software is <= than that it should be fine. Checking twice
of course even lowers that probability)

People not answering nag-mails are a big problem for TMDA
etc. . That's why it's useless on its own.

Your exapmles are good points and yes I don't know a
complete solution for them.
For (1) I'd ask: How often do you expect emails from online
merchants the first time? If it's not too often you can
check new emails in "Spam Folder" just for a period of time
when expecting such an email.
For (2) there's no good solution. (As long as people write
emails with a motivation not directly given by you)

Utopia: When enough people would be using Spambayes some
"reply-confirm-protocol" could be used in background so
users won't hear that nag-emails noise. (It assumes spammers
won't use Spambayes and a valid email address)

Yours sincerly,
  Christoph Becker-Freyseng

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-07-01 16:18

Message:
Logged In: YES 
user_id=31435

The problem is that it's not zero -- as you'll eventually 
discover if you try this.  Beyond general probability concerns, 
two systematic effects act in this direction:  (1) desired email 
from online merchants very often scores spammy the first 
time or two you get it from a given vendor, and the reply-to 
address often isn't monitored (i.e., there's nobody on the 
other end to *do* a TMDA dance, even if they would want 
to), and before the first time you get a msg from them you 
have no idea what to put in your whitelist; and, (2) lots of 
people simply will not respond to a TMDA nag msg, as Skip 
said.  TMDA users know what those msgs are about, but, for 
example, one of my sisters forwarded one of them to me 
asking whether it was a virus(!).  Skip said he stopped 
responding to them too.  I also did, as 4 times out of 5, the 
attempt to respond to one simply generated another 
braindead bounce msg for *me* to deal with.  Trying to make 
other people deal with your spam doesn't work in practice.

----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 16:04

Message:
Logged In: YES 
user_id=186848

Why should I have to check for emails that were stastically
classified as spam AND whose sender doesn't reply for
confirmation. The probability of ignoring an important email
that way is just zero.
This approach doesn't need the filter to be perfect. The
only thing it can't deal with is people that write emails
that seem to be spam and then when asked for confirmation
don't reply.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-07-01 13:43

Message:
Logged In: YES 
user_id=44345

Christoph,  I think you're missing the point about checking spams.
You will never be able to completely avoid that task.  It can be
sped up dramatically by gathering all messages which look like
spam together so you can scan them quickly (just examining 
subjects for example), but if you simply delete such messages you
will eventually lose valid email.  No spam filter is (or will ever be)
perfect.


----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 12:59

Message:
Logged In: YES 
user_id=186848

O.K. we could discuss the right way of handling spam forever.
IMO as long as you have to check emails in the "Spam-Folder"
it's not the right way, because finally it does not save work.

If there is no interest of having added above capabilities
to Spambayes I'll use TMDA's scripting abilities to add
hammie.py to it. Solving the problem the other way round.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-07-01 12:18

Message:
Logged In: YES 
user_id=44345

I tried TMDA some time ago and didn't like it because it was
too cumbersome to manage and many of my correspondents
didn't understand the emails they received.  Similarly, when
I get one of those "please <click or reply> so I know you're
not a spammer" messages, I simply delete it.

I really don't think that's the right way to do things.  If TMDA
is cool for you, stick it in your procmail pipeline and use it
in addition to or instead of Spambayes, but don't merge the
two.


----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 12:05

Message:
Logged In: YES 
user_id=186848

(OT: added my real name)

I understand the point of having spam-bayes creating a kind
of white-list.
(I'm surprised that spammers really dare to use
support at microsoft.com etc. It's making damage to a company
name and lawyers are always happy having cause :-) )

But the important point is NOT the white-list, but that a
sender of an email classified as spam will get an automatic
reply enabling him to change the classification.
(If an email is classified as spam and the sender doesn't
reply to an email asking for confirmation it's 99.999% spam
-- which is enough for me not worrying about emails staying
in "Spam-Folder")

Thanks,
   Christoph Becker-Freyseng


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-07-01 11:36

Message:
Logged In: YES 
user_id=44345

Whitelist functionality really isn't needed.  Spambayes already
tokenizes email addresses, so over time it effectively builds up
a whitelist for you.  Here are some examples from my current
training database (the tuple values are (nspam, nham)):

>>> db["email name:itineraries"]
(71, 10)
>>> db["email name:guido"]
(0, 8)
>>> db["email name:webmaster"]
(58, 45)
>>> db["email name:skip"]
(365, 314)
>>> db["email name:support"]
(136, 13)

Note that itineraries at mojam.com & support at microsoft.com
are frequently forged in mail I receive.  webmaster at mojam.com
and skip at pobox.com are forged a fair amount, but are also
frequently correct.  On the other hand, nobody has so far taken
Guido's name in vain in my incoming email.  (I rarely train on
Python-related email, so there are only a few messages from
Guido in my training database.)

Even if you implemented such a feature it would probably not
be as sensitive as the current tokenizing scheme.  In addition, you
would still have to scan your spam.  You will eventually get a
valid email message from someone not on your whitelist.

Regarding:

    Submitted By:
    Why do you need this (beyond-thoughts)

it's because (in general) too many people submit incomplete bug
reports anonymously and then can't be contacted to complete
their report.  This was a significant problem with the Python
project and sort of carried over to the Spambayes project.


----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 10:29

Message:
Logged In: YES 
user_id=186848

I know that TMDA and Spambayes have different approaches,
but this is what makes it useful combining them.
With Spambayes till now I had no false classified emails
(just few "unsure") so I'm quite satisfied with it. However
I still have to check the emails in "Spam-Folder" because I
don't want to risk loosing some false-classified important
email.
When Spambayes had some TMDA capabilities it could just sent
a confirmation-email to the asumed spammer. If he doesn't
reply then it's really his fault. So I won't have to check
emails in the "Spam-Folder" at all.
On the other side people I send emails to could be
automatically added to a white-list so they surely won't
have trouble with Spamfilters.
I have thought of some more configurable rules that could be
added making Spambayes an interactive-AI-Spamfilter.

I'd like to make a demo-implementation but I need some
starting points. Especially I don't know how the Outlook
stuff works. But maybe I should just try implementing such a
thing for the pop3- and smtp-proxy.

Thanks,
   Christoph Becker-Freyseng


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2003-06-30 21:38

Message:
Logged In: YES 
user_id=29957

I can't imagine that this feature would ever be added to
spambayes. It's a completely different approach to
spam-filtering, with almost nothing in common with the
existing approach. Spambayes will gradually improve as you
train it further - the initial flurry of unsures is probably
just insufficient training. Note also that you can adjust
the cutoffs to end up with more or less unsures.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=762783&group_id=61702



More information about the spambayes-dev mailing list