client-server parallellised number crunching
Thomas Rachel
nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915 at spamschutz.glglgl.de
Wed Apr 27 05:35:16 EDT 2011
Am 26.04.2011 21:55, schrieb Hans Georg Schaathun:
> Now, I would like to use remote hosts as well, more precisely, student
> lab boxen which are rather unreliable. By experience I'd expect to
> lose roughly 4-5 jobs in 100 CPU hours on average. Thus I need some
> way of detecting lost connections and requeue unfinished tasks,
> avoiding any serious delays in this detection. What is the best way to
> do this in python?
As far as I understand, you acquire a job, send it to a remote host via
a socket and then wait for the answer. Is that correct?
In this case, I would put running jobs together with the respective
socket in a "running queue". If you detect a broken connection, put that
job into the "todo" queue again.
> ... if I could detect disconnects and
> requeue the tasks from the networking threads. Is that possible
> using python sockets?
Of course, why not? It might depend on some settings you set (keepalive
etc.); but generally you should get an exception when trying a
communication over a disconnected connection (over a disconnection? ;-))
When going over tne network, aviod pickling. Better use an own protocol.
Thomas
More information about the Python-list
mailing list