[Spambayes] training

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Wed Feb 19 17:02:05 EST 2003

2/19/2003 4:50:19 PM, "Meyer, Tony" <T.A.Meyer at massey.ac.nz> wrote:

>(The problem with being on the nicer side of the world is that most of the 
mail arrives when you are asleep.  Apologies for the length of the reply).
>> The problem here is that some mailers pretty much lose most 
>> of the headers when you do a forward operation...
>Which would be when they would have to place the id in the body - mailers 
might do strange things to the body when forwarding, but surely none of them 
actually remove content.
>It would be nice if the smtpproxy could automagically change to including the 
id in the body if you started forwarding it messages without ids.  (Well, not 
nice for those of us who like control, but for the 'average-user'.

We really should make every effort to make the non-OL2K side of spambayes 
mailer agnostic.  There's just too many of 'em... a testing matrix might be 
interesting, but I don't believe we should *ever* put mailer specific code in 
the core stuff.  Specific mailer plugins would be cool, but most mailers have 
no such plugin architecture.

>> Placing 
>> something like a url in the body of 
>> a message is another possibility that's been raised.  It's 
>> somewhat dangerous, particularly in the case of multipart 
>> messages, and for html messages may not 
>> be visible at all.
>I think the spoofing possibilties of a URL (as back in the November posts) 
removes it as a possibility, as nice as it would be.  A non-clickable message 
id, though, shouldn't be spoofable (a spammer shouldn't be able to generate 
valid ids).


>> Also, consider the company that gets
>> gigabytes of email every day.  How long do they keep a 
>> message in their pool for future training?
>This is a problem with the existing pop3proxy as well, of course.  Isn't 
spambayes still aimed at individuals, not organisations?  
>> And anyway, forwarding a message to a special address is 
>> still too much work.
>If this is the agreed conclusion, then I don't really see any options other 
>(a) Don't get the user to train (they would have to start with some sort of 
pretrained database).  This does really kill all the power of spambayes, even 
if they could update their pretrained databases (that someone else trains for 
>(b) Integration into lots of clients, al la the Outlook plugin.

I'm a bit stumped here, too... still thinkin hard, maybe some kind of fuzzy 
matching?  <wink>  Let's get creative, think outside the box, yadda yadda... - 

>> Absolutely.  As things are right now, it's not useable by anyone but 
>> people like us, which as dismaying as that may be, is not the norm.
>Well, I would say that the Outlook plugin *is* usable by anyone, except that 
you have to install Python first, and removing the plugin is not simple.  
Well, I guess some sort of automated training would be good too (Mark is 
working on this, I believe).
>Anyway, since I've got the time, I'll go ahead and make the patches to get 
the smtpproxy to work, and then we can evaluate it.  If it gets thrown away, 
oh well never mind :)  It could at least make things easier for those that are 
currently using it, while we all build integrations into everyone else's 
favourite mail client.

Tony, don't spend a whole lot of time making the smtpproxy work in a 
production manner.  It'll be a good research tool, but it can't share database 
with the pop3proxy, and so training will be moot.  It will need to be 
integrated with the pop3proxy, a non-trivial task as pop3proxy uses asyncore 
module, Dibbler, and a bunch of other stuff that Richie might be the only 
person on the planet that understands right now... I'm workin on getting my 
head around it, but I'm not there yet. - TimS
>=Tony Meyer
>Spambayes mailing list
>Spambayes at python.org

c'est moi - TimS

More information about the Spambayes mailing list