[spambayes-dev] [ spambayes-Feature Requests-762783 ] TMDA capabilities

SourceForge.net noreply at sourceforge.net
Tue Jul 1 14:04:58 EDT 2003


Feature Requests item #762783, was opened at 2003-06-29 17:33
Message generated for change (Comment added) made by beyond-thoughts
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=762783&group_id=61702

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Becker-Freyseng, Christoph (beyond-thoughts)
Assigned to: Nobody/Anonymous (nobody)
Summary: TMDA capabilities

Initial Comment:
I just switched from TMDA (http://tmda.net/) to Spambayes.
Watching emails that were neither in black- or
whitelist was to annoying. Of course those people get a
reply (please see at TMDA-Homepage to see how it works)
but about 50% don't understand the reply-mail. (I don't
know why -- the text is very clear!).
Spambayes doesn't queue emails that can't be classified
-- you receive them as "unsure".

I think combining both techniques would cut down wrong
positive and wrong negative and the "unsure" numbers.
Additionaly it might be useful for training having a
definite black- and whitelist.

So what are good points (files, classes, methods) to
add such a feature?
Is it dependent whether pop3proxy, hammie, ... is used?

Thank You,
   Christoph Becker-Freyseng


----------------------------------------------------------------------

>Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 20:04

Message:
Logged In: YES 
user_id=186848

Why should I have to check for emails that were stastically
classified as spam AND whose sender doesn't reply for
confirmation. The probability of ignoring an important email
that way is just zero.
This approach doesn't need the filter to be perfect. The
only thing it can't deal with is people that write emails
that seem to be spam and then when asked for confirmation
don't reply.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-07-01 17:43

Message:
Logged In: YES 
user_id=44345

Christoph,  I think you're missing the point about checking spams.
You will never be able to completely avoid that task.  It can be
sped up dramatically by gathering all messages which look like
spam together so you can scan them quickly (just examining 
subjects for example), but if you simply delete such messages you
will eventually lose valid email.  No spam filter is (or will ever be)
perfect.


----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 16:59

Message:
Logged In: YES 
user_id=186848

O.K. we could discuss the right way of handling spam forever.
IMO as long as you have to check emails in the "Spam-Folder"
it's not the right way, because finally it does not save work.

If there is no interest of having added above capabilities
to Spambayes I'll use TMDA's scripting abilities to add
hammie.py to it. Solving the problem the other way round.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-07-01 16:18

Message:
Logged In: YES 
user_id=44345

I tried TMDA some time ago and didn't like it because it was
too cumbersome to manage and many of my correspondents
didn't understand the emails they received.  Similarly, when
I get one of those "please <click or reply> so I know you're
not a spammer" messages, I simply delete it.

I really don't think that's the right way to do things.  If TMDA
is cool for you, stick it in your procmail pipeline and use it
in addition to or instead of Spambayes, but don't merge the
two.


----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 16:05

Message:
Logged In: YES 
user_id=186848

(OT: added my real name)

I understand the point of having spam-bayes creating a kind
of white-list.
(I'm surprised that spammers really dare to use
support at microsoft.com etc. It's making damage to a company
name and lawyers are always happy having cause :-) )

But the important point is NOT the white-list, but that a
sender of an email classified as spam will get an automatic
reply enabling him to change the classification.
(If an email is classified as spam and the sender doesn't
reply to an email asking for confirmation it's 99.999% spam
-- which is enough for me not worrying about emails staying
in "Spam-Folder")

Thanks,
   Christoph Becker-Freyseng


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-07-01 15:36

Message:
Logged In: YES 
user_id=44345

Whitelist functionality really isn't needed.  Spambayes already
tokenizes email addresses, so over time it effectively builds up
a whitelist for you.  Here are some examples from my current
training database (the tuple values are (nspam, nham)):

>>> db["email name:itineraries"]
(71, 10)
>>> db["email name:guido"]
(0, 8)
>>> db["email name:webmaster"]
(58, 45)
>>> db["email name:skip"]
(365, 314)
>>> db["email name:support"]
(136, 13)

Note that itineraries at mojam.com & support at microsoft.com
are frequently forged in mail I receive.  webmaster at mojam.com
and skip at pobox.com are forged a fair amount, but are also
frequently correct.  On the other hand, nobody has so far taken
Guido's name in vain in my incoming email.  (I rarely train on
Python-related email, so there are only a few messages from
Guido in my training database.)

Even if you implemented such a feature it would probably not
be as sensitive as the current tokenizing scheme.  In addition, you
would still have to scan your spam.  You will eventually get a
valid email message from someone not on your whitelist.

Regarding:

    Submitted By:
    Why do you need this (beyond-thoughts)

it's because (in general) too many people submit incomplete bug
reports anonymously and then can't be contacted to complete
their report.  This was a significant problem with the Python
project and sort of carried over to the Spambayes project.


----------------------------------------------------------------------

Comment By: Becker-Freyseng, Christoph (beyond-thoughts)
Date: 2003-07-01 14:29

Message:
Logged In: YES 
user_id=186848

I know that TMDA and Spambayes have different approaches,
but this is what makes it useful combining them.
With Spambayes till now I had no false classified emails
(just few "unsure") so I'm quite satisfied with it. However
I still have to check the emails in "Spam-Folder" because I
don't want to risk loosing some false-classified important
email.
When Spambayes had some TMDA capabilities it could just sent
a confirmation-email to the asumed spammer. If he doesn't
reply then it's really his fault. So I won't have to check
emails in the "Spam-Folder" at all.
On the other side people I send emails to could be
automatically added to a white-list so they surely won't
have trouble with Spamfilters.
I have thought of some more configurable rules that could be
added making Spambayes an interactive-AI-Spamfilter.

I'd like to make a demo-implementation but I need some
starting points. Especially I don't know how the Outlook
stuff works. But maybe I should just try implementing such a
thing for the pop3- and smtp-proxy.

Thanks,
   Christoph Becker-Freyseng


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2003-07-01 01:38

Message:
Logged In: YES 
user_id=29957

I can't imagine that this feature would ever be added to
spambayes. It's a completely different approach to
spam-filtering, with almost nothing in common with the
existing approach. Spambayes will gradually improve as you
train it further - the initial flurry of unsures is probably
just insufficient training. Note also that you can adjust
the cutoffs to end up with more or less unsures.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=762783&group_id=61702



More information about the spambayes-dev mailing list