[Spambayes] SpamBayes for Olde Worlde environments

Mon Dec 18 16:45:04 CET 2006

If you get your mail from a POP3 server, you may be able to configure
your mail client to first download only the message headers. You could
then download the bodies of only the messages that don't look like spam.
If you don't want to do that filtering manually, try feeding the headers
to Spambayes. Maybe it'll work well enough to at least cut down on the
spam you have to pay to receive. 

Bob 

  _____  

	From: spambayes-bounces at python.org
[mailto:spambayes-bounces at python.org] On Behalf Of Stuart Moors
	Sent: Thursday, December 14, 2006 1:01 PM
	To: spambayes at python.org
	Subject: [Spambayes] SpamBayes for Olde Worlde environments

	Much as I hate to admit it, where I live at the moment is,
technologically speaking, several tens of eons behind the accepted
world-wide median.

	There are good reasons for this, but they are irrelevant. 

	Suffice it to say that I still depend on dial-up to get my
email. 

	So, while Bayesian spam techniques are OK/Good/Fine/Brilliant
(or, at least, impressive), they are little use if I still have to pay
for the download of spam before I decide their classification.

	That is: 
	SPAM can be classified as such and discarded for two reasons: 
	A. to avoid having your Inbox cluttered 
	B. to avoid having to pay for the download of the crap 

	It seems to me that the major thrust of the SpamBayes (and
similar) initiatives is Reason A. Which is a pity, because I am at least
as interested in Reason B.

	If spam can be so classified by my email service provider,
BEFORE I ever get to see it, it woul d be far more useful..... 

	....unless, of course, the algorithm has been too aggressive and
discarded an email that was bona fide - a heinous crime in my book.

	So, how about - for the email service providers, a facility
that: 

	A. filters according to Bayesian principles, 
	B. retains spam at the server, either indefinitely until the
space runs out or until a per-message expiry date occurs 
	C. alerts the user to a SINGLE message (say once per week) of
filtered out stats and summaries 

	The stats could also be analysed. A series of messages from the
same source may very well be ham, trying desparately to be accepted
(despite the fact that the sender's first name is Nigeria!). This way, I
can tweak a white-list to allow certain sources immunity from filtering,
even if standard analysis would be definite about spamminess.

	The retention at the server would mean that I could log on to
web-mail access to retrieve messages that I discovered to be ham.

	On signing off, I'd like to say that I am very appreciative of
the efforts being brought to bear on this scourge of the digital age.
Yet, I sometimes wonder If there's not some better way (some
authentication protocol mechanism, perhaps) to identify spam, other than
analysing even personally-trained word frequencies in arriving mail. In
99% of the cases, mail arriving at my inbox is either from a sender I
have in my address book, or is in response to an email I have sent
myself. This must cover the majority of cases for others too. The
remaining 1% must be unwanted or unsolicited and could be handled in
more human-oriented ways (so that they cannot be easily be automated) in
order that I can decide if they are bona fide or not. (e.g. a message
autotically returned to the sender saying that unsolicited mail has been
detected and will only be delivered if an answer to a question is given
correctly (and the answer is the entry of a character string equivalent
to an obfuscated image of the character string). This method requires no
knowledge of my interests (which may include stock investments, sexing
of chickens, penis dimensions, drug therapies or whatever) or of the
style or language of my bona fide correspondents, solicited or
otherwise.

	I understand that the foregoing is a very "personal email" view
of the world. If I operated an e-commerce site, I might have a different
view, but even then, email could be discarded unless human input had
been seen to have been made (a product number, a phrase indicating a
query or even the same technique noted above), so automated spam can be
discarded.

	Hoping that the word frequency in this email is such that you
will read it... 

	-- 
	------------------------------------- 
	Stuart Moors 
	Alarm Forest, St.Helena 
	Tel:  (00290) 3255  Email: moors at helanta.sh 
	------------------------------------- 

	--
	No virus found in this outgoing message.
	Checked by AVG Free Edition.

  _____  

	This e-mail has been scanned for viruses by the Cable & Wireless
St. Helena e-mail security system - powered by McAfee. 

  _____  

	This e-mail has been scanned for viruses by the Cable & Wireless
St. Helena e-mail security system - powered by McAfee. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20061218/b567ccb9/attachment.html