object references/memory access

Karthik Gurusamy kar1107 at gmail.com
Tue Jul 3 04:13:15 CEST 2007

On Jul 2, 6:32 pm, Steve Holden <s... at holdenweb.com> wrote:
> Karthik Gurusamy wrote:
> > On Jul 2, 3:01 pm, Steve Holden <s... at holdenweb.com> wrote:
> >> Karthik Gurusamy wrote:
> >>> On Jul 1, 12:38 pm, dlomsak <dlom... at gmail.com> wrote:
> >> [...]
> >>> I have found the stop-and-go between two processes on the same machine
> >>> leads to very poor throughput. By stop-and-go, I mean the producer and
> >>> consumer are constantly getting on and off of the CPU since the pipe
> >>> gets full (or empty for consumer). Note that a producer can't run at
> >>> its top speed as the scheduler will pull it out since it's output pipe
> >>> got filled up.
> >> But when both processes are in the memory of the same machine and they
> >> communicate through an in-memory buffer, what's to stop them from
> >> keeping the CPU fully-loaded (assuming they are themselves compute-bound)?
> > If you are a producer and if your output goes thru' a pipe, when the
> > pipe gets full, you can no longer run. Someone must start draining the
> > pipe.
> > On a single core CPU when only one process can be running, the
> > producer must get off the CPU so that the consumer may start the
> > draining process.
> Wrong. The process doesn't "get off" the CPU, it remains loaded, and
> will become runnable again once the buffer has been depleted by the
> other process (which is also already loaded into memory and will become
> runnable as soon as a filled buffer becomes available).

huh? "get off" when talking about scheduling and CPU implies you are
not running.
It is a common term to imply that you are not running -- doesn't mean
it goes away from main memory. Sorry where did you learn your CS

> >>> When you increased the underlying buffer, you mitigated a bit this
> >>> shuffling. And hence saw a slight increase in performance.
> >>> My guess that you can transfer across machines at real high speed, is
> >>> because there are no process swapping as producer and consumer run on
> >>> different CPUs (machines, actually).
> >> As a concept that's attractive, but it's easy to demonstrate that (for
> >> example) two machines will get much better throughput using the
> >> TCP-based FTP to transfer a large file than they do with the UDP-based
> >> TFTP. This is because the latter protocol requires the sending unit to
> >> stop and wait for an acknowledgment for each block transferred. With
> >> FTP, if you use a large enough TCP sliding window and have enough
> >> content, you can saturate a link as ling as its bandwidth isn't greater
> >> than your output rate.
> >> This isn't a guess ...
> > What you say about a stop-n-wait protocol versus TCP's sliding window
> > is correct.
> > But I think it's totally orthogonal to the discussion here. The issue
> > I'm talking about is how to keep the end nodes chugging along, if they
> > are able to run simultaneously. They can't if they aren't on a multi-
> > core CPU or one different machines.
> If you only have one CPU then sure, you can only run one process at a
> time. But your understanding of how multiple processes on the same CPU
> interact is lacking.


> >>> Since the two processes are on the same machine, try using a temporary
> >>> file for IPC. This is not as efficient as real shared memory -- but it
> >>> does avoid the IPC stop-n-go. The producer can generate the multi-mega
> >>> byte file at one go and inform the consumer. The file-systems have
> >>> gone thru' decades of performance tuning that this job is done really
> >>> efficiently.
> >> I'm afraid this comes across a bit like superstition. Do you have any
> >> evidence this would give superior performance?
> > I did some testing before when I worked on boosting a shell pipeline
> > performance and found using file-based IPC was very good.
> > (some details athttp://kar1107.blogspot.com/2006/09/unix-shell-pipeline-part-2-using....
> > )
> > Thanks,
> > Karthik
> >>>> Thanks for the replies so far, I really appreciate you guys
> >>>> considering my situation and helping out.
> If you get better performance by writing files and reading them instead
> of using pipes to communicate then something is wrong.

Why don't you provide a better explanation for the observed behavior
than to just claim that a given explanation is wrong? I did mention
using real shared memory is better. I do know the cost of using a file
("physical disk movements") - but with the amount of buffering that
goes on today's file-system implementations, for this problem, we will
see big improvement.


> regards
>   Steve
> --
> Steve Holden        +1 571 484 6266   +1 800 494 3119
> Holden Web LLC/Ltd          http://www.holdenweb.com
> Skype: holdenweb      http://del.icio.us/steve.holden
> --------------- Asciimercial ------------------
> Get on the web: Blog, lens and tag the Internet
> Many services currently offer free registration
> ----------- Thank You for Reading -------------

More information about the Python-list mailing list