[Spambayes] C++ Compiled version of sb_client, with benchmarks
Jeff Epler
jepler at unpythonic.net
Tue Dec 30 14:51:57 EST 2003
sbcc is a C++ client for sb_xmlrpcserver, replacing sb_client.py. It uses
the C++ library from http://xmlrpc-c.sf.net/ and is less than 40 lines long.
(It took a little hacking to get xmlrpc-c to compile in the first place,
so it may not be a good choice. It was the first C++ binding I found
for xmlrpc)
I wrote this because in SpamAssassin they say they get a substantial
speed increase by using spamd/spamc, and the fact that spamc is written
in C is part of the speed advantage.
Having written sb_cclient, my own benchmarks did show some speed difference,
up to a 43% decrease in wall time when processing messages in parallel.
However, startup time of the C++ program was still significant on small
messages.
Setup
-----
1GHz Duron
Fedora Core 1
python 2.2.3
spambayes 1.0a7
sb_xmlrpcserver.py running on a pickle database, also handling incoming mail
Testing method
--------------
I created a Unix MBOX file with a selection of ham and spam messages, 289
messages in 1.5 megabytes. I tested as follows:
time formail [-n X] -s CLIENT < sample > /dev/null
The "-S sb_client.py" line represents using python -S to avoid importing site
to trim a small amount from Python's startup time.
The "S" and "L" lines represent the time to process a single small (524
byte) or large (43920 byte) message 101 times. The "sb_ccclient101"
lines represent doing this in a single invocation of sb_cclient with a
loop on the xmrpc request.
Results:
Client Wall Time Mails/sec
sb_client.py 44.230 6.5 100%
-n 4 44.716 6.4 101%
-n 4 -S sb_client.py 41.387 7.0 94%
sb_cclient 31.876 9.1 72%
-n 4 25.164 11.5 57%
sb_client.pyS 12.106 8.3 78%
sb_client.pyL 26.688 3.8 171%
sb_cclientS 7.276 13.9 47%
sb_cclientL 24.018 4.2 47%
sb_cclient101 4.118 24.5 27%
sb_cclient101L 24.377 4.1 159%
Conclusions
-----------
On small and moderately sized messages, a compiled-language version of
sb_client can give a clear speedup, (sb_client.py vs sb_cclient -n 4)
but the startup time is still a relatively large when messages are small
(sb_cclientS vs sb_cclient101) and if messages are large then startup
time is irrelevant (sb_client.pyL vs sb_cclient101L)
-------------- next part --------------
#include <iostream>
#include <sstream>
#include <string>
#include <XmlRpcCpp.h>
#define NAME "sb_cclient"
#define VERSION "1.0"
int main(int argc, char **argv) {
std::string s = std::string(std::istreambuf_iterator<char>(std::cin),
std::istreambuf_iterator<char>());
try {
XmlRpcClient::Initialize(NAME, VERSION);
XmlRpcValue v = XmlRpcValue::makeBase64(
reinterpret_cast<const unsigned char*>(s.c_str()),
s.size());
XmlRpcValue va = XmlRpcValue::makeArray();
va.arrayAppendItem(v);
XmlRpcClient sb("http://localhost:65000/RPC2");
XmlRpcValue res = sb.call("filter", va);
const unsigned char *od;
size_t ol;
res.getBase64(od, ol);
std::string u(reinterpret_cast<const char *>(od), ol);
std::cout << u;
} catch (XmlRpcFault& fault) {
cerr << argv[0] << ": XML-RPC fault #" << fault.getFaultCode()
<< ": " << fault.getFaultString() << endl;
std::cout << s;
} catch (...) {
cerr << "buh!?" << endl;
std::cout << s;
}
}
More information about the Spambayes
mailing list