[CentralOH] Popen file descriptor difference between Python 2.6 and 2.7+ on Linux

Joe Shaw joe at joeshaw.org
Mon Feb 19 17:54:37 EST 2018


Hi,

If you have access to the shell, bash can do a nice job of this
concurrently:

    diff -u <(gunzip -c file1.gz) <(gunzip -c file2.gz)

Under the covers it’s doing the /proc/fd trick, and it saves you from
having to use a temporary file or reimplementing it yourself in Python.

Joe

On Mon, Feb 19, 2018 at 3:55 PM Eric Floehr <eric at intellovations.com> wrote:

> Good ideas... the actual application is gunzipping two ASCII files
> (gzipp'ed for disk space consideration) and diff'ing them.
>
> These files are added to but not just appended to, so we need to find the
> changes between two files. We do that, rather than processing the whole
> file each time because these are large files with lots of row and parsing
> and determining if a row's data exists or has changed would be hugely
> expensive. And the system diff program is the best for that.
>
>
>
>
>
> On Sat, Feb 17, 2018 at 6:42 PM, Thomas Winningham <winningham at gmail.com>
> wrote:
>
>> Well, sucks you had to go through this rewrite. I think it shows that all
>> code has a certain level of implicit functionality whose interpretation
>> varies between stakeholders over time. So while it took a little digging, I
>> think it also shows that everything is eventually knowable if needed.
>>
>> "Explicit is better than implicit" although the original code seemed
>> rather explicit its own right, this new code is certainly more explicit. If
>> IO performance becomes an issue with the tempfile, maybe named pipes could
>> be used? I learned a lot about per-process file descriptors though, and
>> reinforces how recently I learned that *nix opens a file based on
>> descriptor and not by name whereas like you say Windows can only have a
>> file opened by one process at a time because the file name is the
>> descriptor in a sense.
>>
>> There's always a way to further over engineer this probably too, I'm
>> thinking with sockets or even a message queue :P
>>
>>
>> On Sat, Feb 17, 2018 at 11:20 AM, Eric Floehr <eric at intellovations.com>
>> wrote:
>>
>>> Thank you Neil and Thomas, here is a summary of what you discovered:
>>>
>>> There is a regression in Python 2.7 where file descriptors are not
>>> passed to child processes in the subprocess module, whereas in Python 2.6
>>> they were.
>>>
>>> In order to address this, they added a "pass_fds" parameter to Popen in
>>> Python 3.2. However Python 2.7 does not have this parameter.
>>>
>>> So, in order for this to work on Python 2.7, I resorted to using named
>>> temporary files. This code won't work on Windows NT and higher (per Python
>>> documentation, because named temporary files can't be opened more than once
>>> on Windows NT and higher but can on UNIX-like systems).
>>>
>>> Here is the summary of changes:
>>>
>>> 1. Wrote contents of first Popen's stdout to a NamedTemporaryFile
>>> 2. Passed the name of the NamedTemporaryFile to the second Popen
>>> 3. Added with blocks to ensure both Popen's get closed properly
>>> 4. Closed the NamedTemporaryFile after the second Popen finished, to
>>> delete the temp file off disk.
>>>
>>> Here is the code:
>>>
>>> --------------------------------------------------
>>> from subprocess import Popen, PIPE
>>> import tempfile
>>>
>>> with Popen(
>>>         ["cat"],
>>>         stdin=open('/tmp/test', 'rb'),
>>>         stdout=PIPE,
>>>         universal_newlines=True).stdout as data:
>>>     tmpfile = tempfile.NamedTemporaryFile()
>>>     tmpfile.write(data.read())
>>>     tmpfile.flush()
>>>
>>> with Popen(
>>>         ["cat",
>>>         tmpfile.name],
>>>         stdout=PIPE,
>>>         universal_newlines=True,).stdout as fddata:
>>>     print(fddata.read())
>>>
>>> tmpfile.close()
>>> --------------------------------------------------
>>>
>>>
>>> This works on 2.6, 2.7, and 3.5. Thanks again for all of your help!
>>> Eric
>>>
>>>
>>>
>>> On Sat, Feb 17, 2018 at 12:39 AM, Thomas Winningham <
>>> winningham at gmail.com> wrote:
>>>
>>>> oh and the shell=False and stderr = PIPE are not needed that was me
>>>> just trying to fudge things somehow and left those in there in that code...
>>>> really the pass_fds flag is the thing
>>>>
>>>> On Sat, Feb 17, 2018 at 12:38 AM, Thomas Winningham <
>>>> winningham at gmail.com> wrote:
>>>>
>>>>> I was replying on Twitter, but I suspect the behavior slowly became
>>>>> undefined somewhere between 2.7 and 3 ... but after 3.2 i can do this and
>>>>> it works:
>>>>> ------------------------
>>>>> from subprocess import Popen, PIPE
>>>>>
>>>>> data = Popen(
>>>>>     ["cat"],
>>>>>     stdin=open('/tmp/test', 'rb'),
>>>>>     stdout=PIPE,
>>>>>     stderr=PIPE,
>>>>>     universal_newlines=True,
>>>>>     shell=False).stdout
>>>>>
>>>>> fd_name = '/dev/fd/%d' % data.fileno()
>>>>>
>>>>> fddata = Popen(
>>>>>     ["cat",fd_name],
>>>>>     stdout=PIPE,
>>>>>     universal_newlines=True,
>>>>>     pass_fds=(data.fileno(),)).stdout
>>>>>
>>>>> print(fddata.read())
>>>>> ------------------------
>>>>> the "pass_fds" option was added in 3.2 per the subprocess
>>>>> documentation
>>>>>
>>>>>
>>>>> On Fri, Feb 16, 2018 at 11:32 PM, Neil Ludban <nludban at columbus.rr.com
>>>>> > wrote:
>>>>>
>>>>>> On Fri, 16 Feb 2018 22:34:35 -0500
>>>>>> Eric Floehr <eric at intellovations.com> wrote:
>>>>>> > I have a really odd problem. I have some code that requires a Linux
>>>>>> file
>>>>>> > descriptor (/dev/fd/N) where N is some number. The following
>>>>>> example code
>>>>>> > demonstrates effectively what is being done.
>>>>>> >
>>>>>> > In order to run the code create a file called "/tmp/test" with some
>>>>>> text in
>>>>>> > it.
>>>>>> >
>>>>>> > The code works in Python 2.6, but in Python 2.7 or later, I get:
>>>>>> >
>>>>>> > cat: /dev/fd/4: No such file or directory
>>>>>>
>>>>>> Smells like a back-porting of adding the O_CLOEXEC flag...
>>>>>>
>>>>>>      Unless O_CLOEXEC flag was specified, the new descriptor is set
>>>>>> to remain
>>>>>>      open across execve(2) system calls; see close(2), fcntl(2) and
>>>>>> O_CLOEXEC
>>>>>>      description.
>>>>>>
>>>>>> https://www.python.org/dev/peps/pep-0446/
>>>>>> _______________________________________________
>>>>>> CentralOH mailing list
>>>>>> CentralOH at python.org
>>>>>> https://mail.python.org/mailman/listinfo/centraloh
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> CentralOH mailing list
>>>> CentralOH at python.org
>>>> https://mail.python.org/mailman/listinfo/centraloh
>>>>
>>>>
>>>
>>> _______________________________________________
>>> CentralOH mailing list
>>> CentralOH at python.org
>>> https://mail.python.org/mailman/listinfo/centraloh
>>>
>>>
>>
>> _______________________________________________
>> CentralOH mailing list
>> CentralOH at python.org
>> https://mail.python.org/mailman/listinfo/centraloh
>>
>>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20180219/2eb5d2f6/attachment-0001.html>


More information about the CentralOH mailing list