Line-by-line processing when stdin is not a tty

Tim Harig usernet at ilthio.net
Wed Aug 11 08:35:08 EDT 2010


On 2010-08-11, Cameron Simpson <cs at zip.com.au> wrote:
> On 11Aug2010 10:32, Tim Harig <usernet at ilthio.net> wrote:
>| On 2010-08-11, Wolfgang Rohdewald <wolfgang at rohdewald.de> wrote:
>| > On Mittwoch 11 August 2010, Cameron Simpson wrote:
>| >> Usually you either
>| >> need an option on the upstream program to tell it to line
>| >> buffer explicitly
>| >
>| > once cat had an option -u doing exactly that but nowadays
>| > -u seems to be ignored
>| >
>| > http://www.opengroup.org/onlinepubs/009695399/utilities/cat.html
>| 
>| I have to wonder why cat knows or cares.  Since we are referring to
>| a single directional pipe, there is no fear of creating any kind of
>| race condition.  In general, I would expect that the shell opens the
>| pipe (pipe()), fork()s, closes its own 0 or 1 descriptor as appropriate
>| for each child,  copies (dup()) one the file descriptors to the
>| appropriate file descriptor for the child process, and exec()s to call
>| the new process.  Neither of the processes, in general, needs to know
>| anything other the to write and read from their given descriptors.
>
> The buffering is a performance choice. Every write requires a context
> switch from userspace to kernel space, and availability of data in the
> pipe will wake up a downstream process blocked trying to read.
>
> It is far more efficient to do as few such copies as possible, so where
> interaction (as you point out) is one way it's usually better to write
> data in larger chunks. But when writing to a terminal, ostensibly for a
> human to read, line buffering is generally better (for exactly the issue
> the OP tripped over - humans expect stuff to happen as it occurs).

Right, I don't question the optimization.  I question whether the
intelligence that performes that optimation should be placed within cat or
whether it should be placed within the shell.  It seems to me that the
shell has a better idea of how the command is being used and can therefore
make a better decision about whether or not buffering is appropriate.



More information about the Python-list mailing list