[spambayes-dev] SpamBayes server compliant w/ spamassassin

Jkx at Pythonfr jkx at pythonfr.org
Sun Apr 25 08:50:21 EDT 2004


On Sunday 25 April 2004 06:32, Skip Montanaro wrote:
>     jkx> Where significant effort ?
>

>
> No, I admit I didn't read your code.  I read your mail message and must
> have not fully understood what you were after.  My apologies.
>
>     jkx> Do  you really want to open one UnixDomain socket per user ?????
>
> Sure, why not?  Unix domain sockets are pretty cheap.

Simply because this is not realist ... this will eat a bunch of socket for 
nothing .. Have you ever heard that OS has max open file descriptor
limit ? ?


>  How do your
> users train their databases?  I presume you are doing all this on your mail
> server.  Are your users local or remote?

The train will be done thought cron in Maildir folder. The users are remote
and use folders via imap 



>     jkx> pscyco have nothing about that. the trouble is 'exec a python' at
>     jkx> each email
>
> I don't see 'exec a python' as a huge problem.  Presumably on a busy server
> the python interpreter and all the compiled bytecode will just be sitting
> in memory buffers awaiting activation.  Lots of systems do the equivalent
> of 'exec a python' or more on a per message basis.  Have you tried it?  Was
> it too slow?


I think you should look closer at how mail delivery works ! 
Have you ever think that you can deliver a bunch of mails at the same time ? 
So you don't have only a one 'exec python' but you will have one per user
for simultanous incomming mail.. For example filtering done thought maildrop
can get (by default) 100 simultanus filter.. so do you really think that 100 * 
exec python is the same as 100 * spamc ??? (cause spamc eat ~500 Kb
and python ~ 4.5 Mb )




>     jkx> so even it the server falls for a strange raison mails aren't lost
>     jkx> .. (spamc do that perfectly )
>
>
> I just ran a quick test of sb_bnfilter.py on my laptop.  In a directory
> containing 501 spams (between 24 and 3080 lines each, average 142 lines) I
> executed:
[snip] .. 

This test doesn't represent any valuable information, since it use
1) only one user
2) only one access .. so only 1 spwan per mail etc etc ..

Please test the same thing w/ ~10 users .. and measure the 
nb of mail path thought the system (MTA + procmail + filter) 



> Presumably performance would also improve on a more serious mail server.
> What's your target processing time per message?

The less .. simply .. I just added a cache system to my code (maintaning 
a hash of already open hammie db) .. and i achieve to something like 
300 mails / min. and test without any filtering give me something like 
600 mails / min... i think doing better would be hard  but can be done 
( using fork / thread or async on the server socket delivery)





Bye Bye .. 



More information about the spambayes-dev mailing list