[Spambayes] C++ Compiled version of sb_client, with benchmarks

Jeff Epler jepler at unpythonic.net
Tue Dec 30 14:51:57 EST 2003


sbcc is a C++ client for sb_xmlrpcserver, replacing sb_client.py.  It uses
the C++ library from http://xmlrpc-c.sf.net/ and is less than 40 lines long.
(It took a little hacking to get xmlrpc-c to compile in the first place,
so it may not be a good choice.  It was the first C++ binding I found
for xmlrpc)

I wrote this because in SpamAssassin they say they get a substantial
speed increase by using spamd/spamc, and the fact that spamc is written
in C is part of the speed advantage.

Having written sb_cclient, my own benchmarks did show some speed difference,
up to a 43% decrease in wall time when processing messages in parallel.
However, startup time of the C++ program was still significant on small
messages.

Setup
-----

1GHz Duron
Fedora Core 1
python 2.2.3
spambayes 1.0a7
sb_xmlrpcserver.py running on a pickle database, also handling incoming mail

Testing method
--------------

I created a Unix MBOX file with a selection of ham and spam messages, 289
messages in 1.5 megabytes.  I tested as follows:
        time formail [-n X] -s CLIENT < sample > /dev/null

The "-S sb_client.py" line represents using python -S to avoid importing site
to trim a small amount from Python's startup time.

The "S" and "L" lines represent the time to process a single small (524 
byte) or large (43920 byte) message 101 times.  The "sb_ccclient101"
lines represent doing this in a single invocation of sb_cclient with a 
loop on the xmrpc request.

Results:
        Client                Wall Time  Mails/sec
        sb_client.py          44.230      6.5         100%
           -n 4               44.716      6.4         101%
        -n 4 -S sb_client.py  41.387      7.0          94%
        sb_cclient            31.876      9.1          72%
           -n 4               25.164     11.5          57%
        sb_client.pyS         12.106      8.3          78%
        sb_client.pyL         26.688      3.8         171%
        sb_cclientS            7.276     13.9          47%
        sb_cclientL           24.018      4.2          47%
        sb_cclient101          4.118     24.5          27%
        sb_cclient101L        24.377      4.1         159%

Conclusions
-----------

On small and moderately sized messages, a compiled-language version of
sb_client can give a clear speedup, (sb_client.py vs sb_cclient -n 4)
but the startup time is still a relatively large when messages are small
(sb_cclientS vs sb_cclient101) and if messages are large then startup
time is irrelevant (sb_client.pyL vs sb_cclient101L)

-------------- next part --------------
#include <iostream>
#include <sstream>
#include <string>

#include <XmlRpcCpp.h>

#define NAME "sb_cclient"
#define VERSION "1.0"

int main(int argc, char **argv) {
	std::string s = std::string(std::istreambuf_iterator<char>(std::cin),
			std::istreambuf_iterator<char>());
	try {
        	XmlRpcClient::Initialize(NAME, VERSION);
		XmlRpcValue v = XmlRpcValue::makeBase64(
			reinterpret_cast<const unsigned char*>(s.c_str()),
			s.size());
		XmlRpcValue va = XmlRpcValue::makeArray();
		va.arrayAppendItem(v);
		XmlRpcClient sb("http://localhost:65000/RPC2");
		XmlRpcValue res = sb.call("filter", va);
		const unsigned char *od;
		size_t ol;
		res.getBase64(od, ol);
		std::string u(reinterpret_cast<const char *>(od), ol);
		std::cout << u;
	} catch (XmlRpcFault& fault) {
		cerr << argv[0] << ": XML-RPC fault #" << fault.getFaultCode()
		     << ": " << fault.getFaultString() << endl;
		std::cout << s;
	} catch (...) {
		cerr << "buh!?" << endl;
		std::cout << s;
	}
}


More information about the Spambayes mailing list