[Spambayes] progress on POP+VM+ZODB deployment

Derek Simkowiak dereks@itsite.com
Mon Oct 28 17:16:03 2002


> > is in the spam collections like Bruce's.  Somebody just needs to figure
> > out how to mine it, methinks.
>
> I think you're missing a basic point, but not due to lack of repetition
> <wink>.

	I'm not missing the basic point, I'm disagreeing with it.  (You
can stop with the lengthy examples of one guy who wants commercial mails
from some particular company or subject domain -- I get it, really, I do.)

	I may personally consider messages from you to be "spam" (not as
Unsolicited Bulk Email, but simply as unwanted messages).  But I don't
think it would be the job of a general-purspose installation-wide spam
identifier to know that about me, as you seem to suggest.

	I would want a tool like SpamBayes to flag emails as being like
the ones in Bruce's collection.  If I like to get mails similar to those,
then nowhere am I obligated to filter those flagged messages into my
"Trash" folder.  If I like to get messages similar to those, but only if
they come from Company X, then I can set up my filters to do that, too.

	But for the vast majority of people, just knowing that a
particular email has Bruce-spam-like content would be enough to want to
filter it into a lower-priority folder, or even directly into Trash.  At
least, I see it as the job of the postmaster to provide a flag that could
be used like that.

	To summarize: I think it's the job of a spam filter (or "flagger")
to identify those messages univerally accepted as being spam -- whether or
not any one person likes that kind of mail.  And although for any given
spam there is _somebody_ on Earth who would want to read it, it would be
up to them to set up their client-app filter rules to work how they want
them to -- even if that includes running a local installation of SpamBayes
to do personalized (high-resolution) filtering.


> This isn't a question of classification technology so much as it's a
> question of personal preference, and so long as you're determined that
> everyone must use the same classifier, personal preference goes out the
> window.

	Yes, and that's exactly what I'm asking for.  I think that for
installation-wide filters (I'll use the term 'flagger' from here on since
no spam filtering should ever take place at a server -- for both legal and
privacy reasons) personal preference is irrelevant.  It's irrelevant
practically by definition.


>  That's a bad use of technology, IMO -- I'm not interested in treating
> everyone like interchangeable cogs.

	I think there are a great many people interested in having all
spam messages treated like interchangeable cogs.  "Spam" meaning a message
that would be universally accepted as being a "spam".

	I've seen many people on this list use Bruce's spam for their
training.  But undoubtedly there is a message in his collection that would
be of interest to at least *someone* on this list.  Does that invalidate
his collection as being a spam training repository?

	I would say no, it does not, because his collection is of the type
"universally accepted as spam".  That is the type of message I would like
to see flagged at Universities, ISPs, and companies.

	And to do that, I don't think ham training can be in the picture,
since somebody's "ham" is another person's "spam", and training on
people's "ham" can only weaken what is considered "universally accepted as
spam".


--Derek