Threading problem / Paramiko problem ?
MRAB
python at mrabarnett.plus.com
Mon Dec 28 21:42:12 EST 2009
mk wrote:
> Hello everyone,
>
> I wrote "concurrent ssh" client using Paramiko, available here:
> http://python.domeny.com/cssh.py
>
> This program has a function for concurrent remote file/dir copying
> (class SSHThread, method 'sendfile'). One thread per host specified is
> started for copying (with a working queue of maximum length, of course).
>
> It does have a problem with threading or Paramiko, though:
>
> - If I specify, say, 3 hosts, the 3 threads started start copying onto
> remote hosts fast (on virtual machine, 10-15MB/s), using somewhat below
> 100% of CPU all the time (I wish it were less CPU-consuming but I'm
> doing sending file portion by portion and it's coded in Python, plus
> there are other calculations, well..)
>
> - If I specify say 10 hosts, copying is fast and CPU is under load until
> there are 2-3 threads left; then, CPU load goes down to some 15% and
> copying gets slow (at some 1MB/s).
>
> It looks as if the CPU time gets divided in more or less even portions
> for each thread running at the moment when the maximum number of threads
> is active (10 in this example) *and it stays this way even if some
> threads are finished and join()ed *.
>
> I do join() the finished threads (take a look at code, someone). Yet the
> CPU consumption and copying speed go down.
>
> Now, it's either that, or Paramiko "maxes out" sending bandwidth per
> thread to the "total divided by number of senders". I have no idea which
> and what's worse, no idea how to test this. I've done profiling which
> indicated nothing, basically all function calls except time.sleep take
> negligible time.
>
From what I can see, your script basically does a "busy wait" in
mainprog(), repeatedly checking whether any threads have finished.
It might use less CPU time if you used the Queue module and the threads
informed the main loop of their progress and when they are about to
finish by putting messages in the queue. The main loop would get the
messages from the queue, updating the progress display or starting a new
thread as appropriate. It wouldn't be constantly polling the threads.
More information about the Python-list
mailing list