client-server parallellised number crunching

Thomas Rachel nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915 at spamschutz.glglgl.de
Wed Apr 27 05:35:16 EDT 2011


Am 26.04.2011 21:55, schrieb Hans Georg Schaathun:

> Now, I would like to use remote hosts as well, more precisely, student
> lab boxen which are rather unreliable.  By experience I'd expect to
> lose roughly 4-5 jobs in 100 CPU hours on average.  Thus I need some
> way of detecting lost connections and requeue unfinished tasks,
> avoiding any serious delays in this detection.  What is the best way to
> do this in python?

As far as I understand, you acquire a job, send it to a remote host via 
a socket and then wait for the answer. Is that correct?

In this case, I would put running jobs together with the respective 
socket in a "running queue". If you detect a broken connection, put that 
job into the "todo" queue again.


> ... if I could detect disconnects and
> requeue the tasks from the networking threads.  Is that possible
> using python sockets?

Of course, why not? It might depend on some settings you set (keepalive 
etc.); but generally you should get an exception when trying a 
communication over a disconnected connection (over a disconnection? ;-))

When going over tne network, aviod pickling. Better use an own protocol.


Thomas



More information about the Python-list mailing list