
Aahz wrote:
<plug> The processing package at http://cheeseshop.python.org/pypi/processing is multi-platform and mostly follows the API of threading. It also allows use of 'shared objects' which live in a manager process. For example the following code is almost identical to the equivalent written with threads: from processing import Process, Manager def f(q): for i in range(10): q.put(i*i) q.put('STOP') if __name__ == '__main__': manager = Manager() queue = manager.Queue(maxsize=3) p = Process(target=f, args=[queue]) p.start() result = None while result != 'STOP': result = queue.get() print result p.join() Josiah wrote:
The IPC uses sockets or (on Windows) named pipes. Linux and Windows are roughly equal in speed. On a P4 2.5Ghz laptop one can retreive an element from a shared dict about 20,000 times/sec. Not sure if that qualifies as fast enough. </plug> Richard

"Richard Oudkerk" <r.m.oudkerk@googlemail.com> wrote:
Depends on what the element is, but I suspect it isn't fast enough. Fairly large native dictionaries seem to run on the order of 1.3 million fetches/second on my 2.8 ghz machine. >>> import time >>> d = dict.fromkeys(xrange(65536)) >>> if 1: ... t = time.time() ... for j in xrange(1000000): ... _ = d[j&65535] ... print 1000000/(time.time()-t) ... 1305482.97346 >>> But really, transferring little bits of data back and forth isn't what is of my concern in terms of speed. My real concern is transferring nontrivial blocks of data; I usually benchmark blocks of sizes: 1k, 4k, 16k, 64k, 256k, 1M, 4M, 16M, and 64M. Those are usually pretty good to discover the "sweet spot" for a particular implementation, and also allow a person to discover whether or not their system can be used for nontrivial processor loads. - Josiah

On 3/25/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Not directly relevant to the discussion, but I attended recently a talk from the main developer of STXXL (http://stxxl.sourceforge.net/), an STL-compatible library for handling huge volumes of data. The keys to efficient processing are support for parallel disks, explicit overlapping between I/O and computation, and I/O pipelining. More details are available at http://i10www.ira.uka.de/dementiev/stxxl/report/. George

On 26/03/07, Josiah Carlson <jcarlson@uci.edu> wrote:
The "20,000 fetches/sec" was just for retreving a "small" object (an integer), so it only really reflects the server overhead. (Sending integer objects directly between processes is maybe 6 times faster.) Fetching string objects of particular sizes from a shared dict gives the following results on the same computer: string size fetches/sec throughput ----------- ----------- ---------- 1 kb 15,000 15 Mb/s 4 kb 13,000 52 Mb/s 16 kb 8,500 130 Mb/s 64 kb 1,800 110 Mb/s 256 kb 196 49 Mb/s 1 Mb 50 50 Mb/s 4 Mb 13 52 Mb/s 16 Mb 3.2 51 Mb/s 64 Mb 0.84 54 Mb/s

"Richard Oudkerk" <r.m.oudkerk@googlemail.com> wrote:
That's a positive sign.
Fetching string objects of particular sizes from a shared dict gives the following results on the same computer:
Those numbers look pretty good. Would I be correct in assuming that there is a speedup sending blocks directly between processes? (though perhaps not the 6x that integer sending gains) I will definitely have to dig deeper, this could be the library that we've been looking for. - Josiah

On 27/03/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Yes, sending blocks directly between processes is over 3 times faster for 1k blocks, and twice as fast for 4k blocks, but after that it makes little difference. (This is using the 'processing.connection' sub-package which is partly written in C.) Of course since these blocks are string data you can avoid the pickle translation which makes things get faster still: the peak bandwidth I get is 40,000 x 16k blocks / sec = 630 Mb/s. PS. It would be nice if the standard library had support for sending message oriented data over a connection so that you could just do 'recv()' and 'send()' without worrying about whether the whole message was successfully read/written. You can use 'socket.makefile()' for line oriented text messages but not for binary data.

"Richard Oudkerk" <r.m.oudkerk@googlemail.com> wrote:
I'm surprised that larger objects see little gain from the removal of an encoding/decoding step and transfer.
Very nice.
Well, there's also the problem that sockets, files, and pipes behave differently on Windows. If one is only concerned about sockets, there are various lightly defined protocols that can be simply implemented on top of asyncore/asynchat, among them is the sending of a 32 bit length field in network-endian order, followed by the data to be sent immediately afterwards. Taking some methods and tossing them into a synchronous sockets package wouldn't be terribly difficult (I've done a variant of this for a commercial project). Doing this generally may not find support, as my idea of sharing encoding/decoding/internal state transition/etc in sync/async servers was shot down at least a year ago. - Josiah

"Richard Oudkerk" <r.m.oudkerk@googlemail.com> wrote:
Depends on what the element is, but I suspect it isn't fast enough. Fairly large native dictionaries seem to run on the order of 1.3 million fetches/second on my 2.8 ghz machine. >>> import time >>> d = dict.fromkeys(xrange(65536)) >>> if 1: ... t = time.time() ... for j in xrange(1000000): ... _ = d[j&65535] ... print 1000000/(time.time()-t) ... 1305482.97346 >>> But really, transferring little bits of data back and forth isn't what is of my concern in terms of speed. My real concern is transferring nontrivial blocks of data; I usually benchmark blocks of sizes: 1k, 4k, 16k, 64k, 256k, 1M, 4M, 16M, and 64M. Those are usually pretty good to discover the "sweet spot" for a particular implementation, and also allow a person to discover whether or not their system can be used for nontrivial processor loads. - Josiah

On 3/25/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Not directly relevant to the discussion, but I attended recently a talk from the main developer of STXXL (http://stxxl.sourceforge.net/), an STL-compatible library for handling huge volumes of data. The keys to efficient processing are support for parallel disks, explicit overlapping between I/O and computation, and I/O pipelining. More details are available at http://i10www.ira.uka.de/dementiev/stxxl/report/. George

On 26/03/07, Josiah Carlson <jcarlson@uci.edu> wrote:
The "20,000 fetches/sec" was just for retreving a "small" object (an integer), so it only really reflects the server overhead. (Sending integer objects directly between processes is maybe 6 times faster.) Fetching string objects of particular sizes from a shared dict gives the following results on the same computer: string size fetches/sec throughput ----------- ----------- ---------- 1 kb 15,000 15 Mb/s 4 kb 13,000 52 Mb/s 16 kb 8,500 130 Mb/s 64 kb 1,800 110 Mb/s 256 kb 196 49 Mb/s 1 Mb 50 50 Mb/s 4 Mb 13 52 Mb/s 16 Mb 3.2 51 Mb/s 64 Mb 0.84 54 Mb/s

"Richard Oudkerk" <r.m.oudkerk@googlemail.com> wrote:
That's a positive sign.
Fetching string objects of particular sizes from a shared dict gives the following results on the same computer:
Those numbers look pretty good. Would I be correct in assuming that there is a speedup sending blocks directly between processes? (though perhaps not the 6x that integer sending gains) I will definitely have to dig deeper, this could be the library that we've been looking for. - Josiah

On 27/03/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Yes, sending blocks directly between processes is over 3 times faster for 1k blocks, and twice as fast for 4k blocks, but after that it makes little difference. (This is using the 'processing.connection' sub-package which is partly written in C.) Of course since these blocks are string data you can avoid the pickle translation which makes things get faster still: the peak bandwidth I get is 40,000 x 16k blocks / sec = 630 Mb/s. PS. It would be nice if the standard library had support for sending message oriented data over a connection so that you could just do 'recv()' and 'send()' without worrying about whether the whole message was successfully read/written. You can use 'socket.makefile()' for line oriented text messages but not for binary data.

"Richard Oudkerk" <r.m.oudkerk@googlemail.com> wrote:
I'm surprised that larger objects see little gain from the removal of an encoding/decoding step and transfer.
Very nice.
Well, there's also the problem that sockets, files, and pipes behave differently on Windows. If one is only concerned about sockets, there are various lightly defined protocols that can be simply implemented on top of asyncore/asynchat, among them is the sending of a 32 bit length field in network-endian order, followed by the data to be sent immediately afterwards. Taking some methods and tossing them into a synchronous sockets package wouldn't be terribly difficult (I've done a variant of this for a commercial project). Doing this generally may not find support, as my idea of sharing encoding/decoding/internal state transition/etc in sync/async servers was shot down at least a year ago. - Josiah
participants (3)
-
George Sakkis
-
Josiah Carlson
-
Richard Oudkerk