[CentralOH] Popen file descriptor difference between Python 2.6 and 2.7+ on Linux

Eric Floehr eric at intellovations.com
Mon Feb 19 15:55:00 EST 2018


Good ideas... the actual application is gunzipping two ASCII files
(gzipp'ed for disk space consideration) and diff'ing them.

These files are added to but not just appended to, so we need to find the
changes between two files. We do that, rather than processing the whole
file each time because these are large files with lots of row and parsing
and determining if a row's data exists or has changed would be hugely
expensive. And the system diff program is the best for that.





On Sat, Feb 17, 2018 at 6:42 PM, Thomas Winningham <winningham at gmail.com>
wrote:

> Well, sucks you had to go through this rewrite. I think it shows that all
> code has a certain level of implicit functionality whose interpretation
> varies between stakeholders over time. So while it took a little digging, I
> think it also shows that everything is eventually knowable if needed.
>
> "Explicit is better than implicit" although the original code seemed
> rather explicit its own right, this new code is certainly more explicit. If
> IO performance becomes an issue with the tempfile, maybe named pipes could
> be used? I learned a lot about per-process file descriptors though, and
> reinforces how recently I learned that *nix opens a file based on
> descriptor and not by name whereas like you say Windows can only have a
> file opened by one process at a time because the file name is the
> descriptor in a sense.
>
> There's always a way to further over engineer this probably too, I'm
> thinking with sockets or even a message queue :P
>
>
> On Sat, Feb 17, 2018 at 11:20 AM, Eric Floehr <eric at intellovations.com>
> wrote:
>
>> Thank you Neil and Thomas, here is a summary of what you discovered:
>>
>> There is a regression in Python 2.7 where file descriptors are not passed
>> to child processes in the subprocess module, whereas in Python 2.6 they
>> were.
>>
>> In order to address this, they added a "pass_fds" parameter to Popen in
>> Python 3.2. However Python 2.7 does not have this parameter.
>>
>> So, in order for this to work on Python 2.7, I resorted to using named
>> temporary files. This code won't work on Windows NT and higher (per Python
>> documentation, because named temporary files can't be opened more than once
>> on Windows NT and higher but can on UNIX-like systems).
>>
>> Here is the summary of changes:
>>
>> 1. Wrote contents of first Popen's stdout to a NamedTemporaryFile
>> 2. Passed the name of the NamedTemporaryFile to the second Popen
>> 3. Added with blocks to ensure both Popen's get closed properly
>> 4. Closed the NamedTemporaryFile after the second Popen finished, to
>> delete the temp file off disk.
>>
>> Here is the code:
>>
>> --------------------------------------------------
>> from subprocess import Popen, PIPE
>> import tempfile
>>
>> with Popen(
>>         ["cat"],
>>         stdin=open('/tmp/test', 'rb'),
>>         stdout=PIPE,
>>         universal_newlines=True).stdout as data:
>>     tmpfile = tempfile.NamedTemporaryFile()
>>     tmpfile.write(data.read())
>>     tmpfile.flush()
>>
>> with Popen(
>>         ["cat",
>>         tmpfile.name],
>>         stdout=PIPE,
>>         universal_newlines=True,).stdout as fddata:
>>     print(fddata.read())
>>
>> tmpfile.close()
>> --------------------------------------------------
>>
>>
>> This works on 2.6, 2.7, and 3.5. Thanks again for all of your help!
>> Eric
>>
>>
>>
>> On Sat, Feb 17, 2018 at 12:39 AM, Thomas Winningham <winningham at gmail.com
>> > wrote:
>>
>>> oh and the shell=False and stderr = PIPE are not needed that was me just
>>> trying to fudge things somehow and left those in there in that code...
>>> really the pass_fds flag is the thing
>>>
>>> On Sat, Feb 17, 2018 at 12:38 AM, Thomas Winningham <
>>> winningham at gmail.com> wrote:
>>>
>>>> I was replying on Twitter, but I suspect the behavior slowly became
>>>> undefined somewhere between 2.7 and 3 ... but after 3.2 i can do this and
>>>> it works:
>>>> ------------------------
>>>> from subprocess import Popen, PIPE
>>>>
>>>> data = Popen(
>>>>     ["cat"],
>>>>     stdin=open('/tmp/test', 'rb'),
>>>>     stdout=PIPE,
>>>>     stderr=PIPE,
>>>>     universal_newlines=True,
>>>>     shell=False).stdout
>>>>
>>>> fd_name = '/dev/fd/%d' % data.fileno()
>>>>
>>>> fddata = Popen(
>>>>     ["cat",fd_name],
>>>>     stdout=PIPE,
>>>>     universal_newlines=True,
>>>>     pass_fds=(data.fileno(),)).stdout
>>>>
>>>> print(fddata.read())
>>>> ------------------------
>>>> the "pass_fds" option was added in 3.2 per the subprocess documentation
>>>>
>>>>
>>>> On Fri, Feb 16, 2018 at 11:32 PM, Neil Ludban <nludban at columbus.rr.com>
>>>> wrote:
>>>>
>>>>> On Fri, 16 Feb 2018 22:34:35 -0500
>>>>> Eric Floehr <eric at intellovations.com> wrote:
>>>>> > I have a really odd problem. I have some code that requires a Linux
>>>>> file
>>>>> > descriptor (/dev/fd/N) where N is some number. The following example
>>>>> code
>>>>> > demonstrates effectively what is being done.
>>>>> >
>>>>> > In order to run the code create a file called "/tmp/test" with some
>>>>> text in
>>>>> > it.
>>>>> >
>>>>> > The code works in Python 2.6, but in Python 2.7 or later, I get:
>>>>> >
>>>>> > cat: /dev/fd/4: No such file or directory
>>>>>
>>>>> Smells like a back-porting of adding the O_CLOEXEC flag...
>>>>>
>>>>>      Unless O_CLOEXEC flag was specified, the new descriptor is set to
>>>>> remain
>>>>>      open across execve(2) system calls; see close(2), fcntl(2) and
>>>>> O_CLOEXEC
>>>>>      description.
>>>>>
>>>>> https://www.python.org/dev/peps/pep-0446/
>>>>> _______________________________________________
>>>>> CentralOH mailing list
>>>>> CentralOH at python.org
>>>>> https://mail.python.org/mailman/listinfo/centraloh
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> CentralOH mailing list
>>> CentralOH at python.org
>>> https://mail.python.org/mailman/listinfo/centraloh
>>>
>>>
>>
>> _______________________________________________
>> CentralOH mailing list
>> CentralOH at python.org
>> https://mail.python.org/mailman/listinfo/centraloh
>>
>>
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20180219/91343f3d/attachment-0001.html>


More information about the CentralOH mailing list