[CentralOH] Popen file descriptor difference between Python 2.6 and 2.7+ on Linux

Eric Floehr eric at intellovations.com
Tue Mar 6 12:01:47 EST 2018


That's a great idea, Joe, thanks!

On Mon, Feb 19, 2018 at 5:54 PM, Joe Shaw <joe at joeshaw.org> wrote:

> Hi,
>
> If you have access to the shell, bash can do a nice job of this
> concurrently:
>
>     diff -u <(gunzip -c file1.gz) <(gunzip -c file2.gz)
>
> Under the covers it’s doing the /proc/fd trick, and it saves you from
> having to use a temporary file or reimplementing it yourself in Python.
>
> Joe
>
> On Mon, Feb 19, 2018 at 3:55 PM Eric Floehr <eric at intellovations.com>
> wrote:
>
>> Good ideas... the actual application is gunzipping two ASCII files
>> (gzipp'ed for disk space consideration) and diff'ing them.
>>
>> These files are added to but not just appended to, so we need to find the
>> changes between two files. We do that, rather than processing the whole
>> file each time because these are large files with lots of row and parsing
>> and determining if a row's data exists or has changed would be hugely
>> expensive. And the system diff program is the best for that.
>>
>>
>>
>>
>>
>> On Sat, Feb 17, 2018 at 6:42 PM, Thomas Winningham <winningham at gmail.com>
>> wrote:
>>
>>> Well, sucks you had to go through this rewrite. I think it shows that
>>> all code has a certain level of implicit functionality whose interpretation
>>> varies between stakeholders over time. So while it took a little digging, I
>>> think it also shows that everything is eventually knowable if needed.
>>>
>>> "Explicit is better than implicit" although the original code seemed
>>> rather explicit its own right, this new code is certainly more explicit. If
>>> IO performance becomes an issue with the tempfile, maybe named pipes could
>>> be used? I learned a lot about per-process file descriptors though, and
>>> reinforces how recently I learned that *nix opens a file based on
>>> descriptor and not by name whereas like you say Windows can only have a
>>> file opened by one process at a time because the file name is the
>>> descriptor in a sense.
>>>
>>> There's always a way to further over engineer this probably too, I'm
>>> thinking with sockets or even a message queue :P
>>>
>>>
>>> On Sat, Feb 17, 2018 at 11:20 AM, Eric Floehr <eric at intellovations.com>
>>> wrote:
>>>
>>>> Thank you Neil and Thomas, here is a summary of what you discovered:
>>>>
>>>> There is a regression in Python 2.7 where file descriptors are not
>>>> passed to child processes in the subprocess module, whereas in Python 2.6
>>>> they were.
>>>>
>>>> In order to address this, they added a "pass_fds" parameter to Popen in
>>>> Python 3.2. However Python 2.7 does not have this parameter.
>>>>
>>>> So, in order for this to work on Python 2.7, I resorted to using named
>>>> temporary files. This code won't work on Windows NT and higher (per Python
>>>> documentation, because named temporary files can't be opened more than once
>>>> on Windows NT and higher but can on UNIX-like systems).
>>>>
>>>> Here is the summary of changes:
>>>>
>>>> 1. Wrote contents of first Popen's stdout to a NamedTemporaryFile
>>>> 2. Passed the name of the NamedTemporaryFile to the second Popen
>>>> 3. Added with blocks to ensure both Popen's get closed properly
>>>> 4. Closed the NamedTemporaryFile after the second Popen finished, to
>>>> delete the temp file off disk.
>>>>
>>>> Here is the code:
>>>>
>>>> --------------------------------------------------
>>>> from subprocess import Popen, PIPE
>>>> import tempfile
>>>>
>>>> with Popen(
>>>>         ["cat"],
>>>>         stdin=open('/tmp/test', 'rb'),
>>>>         stdout=PIPE,
>>>>         universal_newlines=True).stdout as data:
>>>>     tmpfile = tempfile.NamedTemporaryFile()
>>>>     tmpfile.write(data.read())
>>>>     tmpfile.flush()
>>>>
>>>> with Popen(
>>>>         ["cat",
>>>>         tmpfile.name],
>>>>         stdout=PIPE,
>>>>         universal_newlines=True,).stdout as fddata:
>>>>     print(fddata.read())
>>>>
>>>> tmpfile.close()
>>>> --------------------------------------------------
>>>>
>>>>
>>>> This works on 2.6, 2.7, and 3.5. Thanks again for all of your help!
>>>> Eric
>>>>
>>>>
>>>>
>>>> On Sat, Feb 17, 2018 at 12:39 AM, Thomas Winningham <
>>>> winningham at gmail.com> wrote:
>>>>
>>>>> oh and the shell=False and stderr = PIPE are not needed that was me
>>>>> just trying to fudge things somehow and left those in there in that code...
>>>>> really the pass_fds flag is the thing
>>>>>
>>>>> On Sat, Feb 17, 2018 at 12:38 AM, Thomas Winningham <
>>>>> winningham at gmail.com> wrote:
>>>>>
>>>>>> I was replying on Twitter, but I suspect the behavior slowly became
>>>>>> undefined somewhere between 2.7 and 3 ... but after 3.2 i can do this and
>>>>>> it works:
>>>>>> ------------------------
>>>>>> from subprocess import Popen, PIPE
>>>>>>
>>>>>> data = Popen(
>>>>>>     ["cat"],
>>>>>>     stdin=open('/tmp/test', 'rb'),
>>>>>>     stdout=PIPE,
>>>>>>     stderr=PIPE,
>>>>>>     universal_newlines=True,
>>>>>>     shell=False).stdout
>>>>>>
>>>>>> fd_name = '/dev/fd/%d' % data.fileno()
>>>>>>
>>>>>> fddata = Popen(
>>>>>>     ["cat",fd_name],
>>>>>>     stdout=PIPE,
>>>>>>     universal_newlines=True,
>>>>>>     pass_fds=(data.fileno(),)).stdout
>>>>>>
>>>>>> print(fddata.read())
>>>>>> ------------------------
>>>>>> the "pass_fds" option was added in 3.2 per the subprocess
>>>>>> documentation
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 16, 2018 at 11:32 PM, Neil Ludban <
>>>>>> nludban at columbus.rr.com> wrote:
>>>>>>
>>>>>>> On Fri, 16 Feb 2018 22:34:35 -0500
>>>>>>> Eric Floehr <eric at intellovations.com> wrote:
>>>>>>> > I have a really odd problem. I have some code that requires a
>>>>>>> Linux file
>>>>>>> > descriptor (/dev/fd/N) where N is some number. The following
>>>>>>> example code
>>>>>>> > demonstrates effectively what is being done.
>>>>>>> >
>>>>>>> > In order to run the code create a file called "/tmp/test" with
>>>>>>> some text in
>>>>>>> > it.
>>>>>>> >
>>>>>>> > The code works in Python 2.6, but in Python 2.7 or later, I get:
>>>>>>> >
>>>>>>> > cat: /dev/fd/4: No such file or directory
>>>>>>>
>>>>>>> Smells like a back-porting of adding the O_CLOEXEC flag...
>>>>>>>
>>>>>>>      Unless O_CLOEXEC flag was specified, the new descriptor is set
>>>>>>> to remain
>>>>>>>      open across execve(2) system calls; see close(2), fcntl(2) and
>>>>>>> O_CLOEXEC
>>>>>>>      description.
>>>>>>>
>>>>>>> https://www.python.org/dev/peps/pep-0446/
>>>>>>> _______________________________________________
>>>>>>> CentralOH mailing list
>>>>>>> CentralOH at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/centraloh
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> CentralOH mailing list
>>>>> CentralOH at python.org
>>>>> https://mail.python.org/mailman/listinfo/centraloh
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> CentralOH mailing list
>>>> CentralOH at python.org
>>>> https://mail.python.org/mailman/listinfo/centraloh
>>>>
>>>>
>>>
>>> _______________________________________________
>>> CentralOH mailing list
>>> CentralOH at python.org
>>> https://mail.python.org/mailman/listinfo/centraloh
>>>
>>>
>> _______________________________________________
>> CentralOH mailing list
>> CentralOH at python.org
>> https://mail.python.org/mailman/listinfo/centraloh
>>
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20180306/377b817e/attachment-0001.html>


More information about the CentralOH mailing list