[Tutor] subprocess.Popen / proc.communicate issue

bruce badouglas at gmail.com
Fri Mar 31 10:49:06 EDT 2017


Cameron!!!

You are 'da man!!

Read your exaplanation.. good stuff to recheck/test and investigate
over time....

In the short term, I'll implement some tests!!

thanks!


On Thu, Mar 30, 2017 at 6:51 PM, Cameron Simpson <cs at zip.com.au> wrote:
> I wrote a long description of how .communicate can deadlock.
>
> Then I read the doco more carefully and saw this:
>
>  Warning: Use communicate() rather than .stdin.write, .stdout.read
>  or .stderr.read to avoid deadlocks due to any of the other OS
>  pipe buffers filling up and blocking the child process.
>
> This suggests that .communicate uses Threads to send and to gather data
> independently, and that therefore the deadlock situation may not arise.
>
> See what lsof and strace tell you; all my other advice stands regardless,
> and
> the deadlock description may or may not be relevant. Still worth reading and
> understanding it when looking at this kind of problem.
>
> Cheers,
> Cameron Simpson <cs at zip.com.au>
>
>
> On 31Mar2017 09:43, Cameron Simpson <cs at zip.com.au> wrote:
>>
>> On 30Mar2017 13:51, bruce <badouglas at gmail.com> wrote:
>>>
>>> Trying to understand the "correct" way to run a sys command ("curl")
>>> and to get the potential stderr. Checking Stackoverflow (SO), implies
>>> that I should be able to use a raw/text cmd, with "shell=true".
>>
>>
>> I strongly recommend avoiding shell=True if you can. It has many problems.
>> All stackoverflow advice needs to be considered with caution. However, that
>> is not the source of your deadlock.
>>
>>> If I leave the stderr out, and just use
>>>    s=proc.communicate()
>>> the test works...
>>>
>>> Any pointers on what I might inspect to figure out why this hangs on
>>> the proc.communicate process/line??
>>
>>
>> When it is hung, run "lsof" on the processes from another terminal i.e.
>> lsof the python process and also lsof the curl process. That will make clear
>> the connections between them, particularly which file descriptors ("fd"s)
>> are associated with what.
>>
>> The run "strace" on the processes. That shoud show you what system calls
>> are in progress in each process.
>>
>> My expectation is that you will see Python reading from one file
>> descriptor and curl writing to a different one, and neither progressing.
>>
>> Personally I avoid .communicate and do more work myself, largerly to know
>> precisely what is going on with my subprocesses.
>>
>> The difficulty with .communicate is that Python must read both stderr and
>> stdout separately, but it will be doing that sequentially: read one, then
>> read the other. That is just great if the command is "short" and writes a
>> small enough amount of data to each. The command runs, writes, and exits.
>> Python reads one and sees EOF after the data, because the command has
>> exited. Then Python reads the other and collects the data and sees EOF
>> because the command has exited.
>>
>> However, if the output of the command is large on whatever stream Python
>> reads _second_, the command will stall writing to that stream. This is
>> because Python is not reading the data, and therefore the buffers fill
>> (stdio in curl plus the buffer in the pipe). So the command ("curl") stalls
>> waiting for data to be consumed from the buffers. And because it has
>> stalled, the command does not exit, and therefore Python does not see EOF on
>> the _first_ stream. So it sits waiting for more data, never reading from the
>> second stream.
>>
>> [...snip...]
>>>
>>> cmd='[r" curl -sS '
>>> #cmd=cmd+'-A  "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
>>> Gecko/20100101 Firefox/38.0"'
>>> cmd=cmd+"-A  '"+user_agent+"'"
>>> ##cmd=cmd+'   --cookie-jar '+cname+' --cookie '+cname+'    '
>>> cmd=cmd+'   --cookie-jar '+ff+' --cookie '+ff+'    '
>>> #cmd=cmd+'-e "'+referer+'"   -d "'+tt+'"  '
>>> #cmd=cmd+'-e "'+referer+'"    '
>>> cmd=cmd+"-L '"+url1+"'"+'"]'
>>> #cmd=cmd+'-L "'+xx+'" '
>>
>>
>> Might I recommand something like this:
>>
>> cmd_args = [ 'curl', '-sS' ]
>> cmd_args.extend( [ '-A', user_agent ] )
>> cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] )
>> cmd_args.extend( [ '-L', url ]
>>
>> and using shell=False. This totally avoids any need to "quote" strings in
>> the command, because the shell is not parsing the string - you're invoking
>> "curl" directly instead of asking the shell to read a string and invoke
>> "curl" for you.
>>
>> Constructing shell commands is tedious and fiddly; avoid it when you don't
>> need to.
>>
>>> try_=1
>>
>>
>> It is preferable to say:
>>
>> try_ = true
>>
>>> while(try_):
>>
>>
>> You don't need and brackets here:
>>
>> while try_:
>>
>> More readable, because less punctuation.
>>
>>>   proc=subprocess.Popen(cmd,
>>> shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>>
>>
>> proc = subprocess.Popen(cmd_args,
>>          stdout=subprocess.PIPE,
>>          stderr=subprocess.PIPE)
>>
>>>   s,err=proc.communicate()
>>>   s=s.strip()
>>>   err=err.strip()
>>>   if(err==0):
>>>     try_=''
>>
>>
>> It is preferable to say:
>>
>> try_ = False
>>
>> Also, you should be looking at proc.returncode, _not_ err. Many programs
>> write informative messages to stderr, and a nonempty stderr does not imply
>> failure.
>>
>> instead, all programs set their exit status to 0 for success and to
>> various nonzero values for failure. So check:
>>
>> if proc.returncode == 0:
>>   try_ = False
>>
>> Or you could bypass try_ altogether and go:
>>
>> while True:
>>   ... subprocess ...
>>   if proc.returncode == 0:
>>     break
>>
>> That may not fit your larger scheme.
>>
>> Cheers,
>> Cameron Simpson <cs at zip.com.au>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list