[ python-Bugs-1546442 ] subprocess.Popen can't read file object as stdin after seek

SourceForge.net noreply at sourceforge.net
Mon Jan 22 02:23:34 CET 2007


Bugs item #1546442, was opened at 2006-08-25 15:52
Message generated for change (Comment added) made by ldeller
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1546442&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: GaryD (gazzadee)
Assigned to: Peter Åstrand (astrand)
Summary: subprocess.Popen can't read file object as stdin after seek

Initial Comment:
When I use an existing file object as stdin for a call
to subprocess.Popen, then Popen cannot read the file if
I have called seek on it more than once.

eg. in the following python code:

>>> import subprocess
>>> rawfile = file('hello.txt', 'rb')
>>> rawfile.readline()
'line 1\n'
>>> rawfile.seek(0)
>>> rawfile.readline()
'line 1\n'
>>> rawfile.seek(0)
>>> process_object = subprocess.Popen(["cat"],
stdin=rawfile, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)

process_object.stdout now contains nothing, implying
that nothing was on process_object.stdin.

Note that this only applies for a non-trivial seek (ie.
where the file-pointer actually changes). Calling
seek(0) multiple times in a row does not change
anything (obviously).

I have not investigated whether this reveals a problem
with seek not changing the underlying file descriptor,
or a problem with Popen not handling the file
descriptor properly.

I have attached some complete python scripts that
demonstrate this problem. One shows cat working after
calling seek once, the other shows cat failing after
calling seek twice.

Python version being used:
Python 2.4.2 (#1, Nov  3 2005, 12:41:57)
[GCC 3.4.3-20050110 (Gentoo Linux 3.4.3.20050110,
ssp-3.4.3.20050110-0, pie-8.7 on linux2


----------------------------------------------------------------------

Comment By: lplatypus (ldeller)
Date: 2007-01-22 12:23

Message:
Logged In: YES 
user_id=1534394
Originator: NO

Fair enough, that's probably cleaner and more efficient than playing games
with fflush and lseek anyway.  If file objects are not supported properly
then maybe they shouldn't be accepted at all, forcing the application to
call fileno() if that's what is wanted.  That might break a lot of
existing code though.  Then again it may be beneficial to get everyone to
review code which passes file objects to Popen in light of this behaviour.

----------------------------------------------------------------------

Comment By: Peter Åstrand (astrand)
Date: 2007-01-22 06:43

Message:
Logged In: YES 
user_id=344921
Originator: NO

It's not obvious that the subprocess module is doing anything wrong here.
Mixing streams and file descriptors is always problematic and should best
be avoided
(http://ftp.gnu.org/gnu/Manuals/glibc-2.2.3/html_node/libc_232.html).
However, the subprocess module *does* accept a file object (based on a
libc stream), for convenience. For things to work correctly, the
application and the subprocess module needs to cooperate. I admit that the
documentation needs improvement on this topic, though. 

It's quite easy to demonstrate the problem, you don't need to use seek at
all. Here's a simple test case:

import subprocess
rawfile = file('hello.txt', 'rb')
rawfile.readline()
p = subprocess.Popen(["cat"], stdin=rawfile, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
print "File contents from Popen() call to cat:"
print p.stdout.read()
p.wait()

The descriptor offset is at the end, since the stream buffers.
http://ftp.gnu.org/gnu/Manuals/glibc-2.2.3/html_node/libc_233.html
describes the need for "cleaning up" a stream, when you switch from stream
functions to descriptor functions. This is described at
http://ftp.gnu.org/gnu/Manuals/glibc-2.2.3/html_node/libc_235.html#SEC244.
The documentation recommends the fclean() function, but it's only available
on GNU systems and not in Python. As I understand it, fflush() works good
for cleaning an output stream. 

For input streams, however, things are difficult. fflush() might work
sometimes, but to be sure, you must set the file pointer as well. And,
this does not work for files that are not random access, since there's no
way of move the buffered data back to the operating system. 

So, since subprocess cannot reliable deal with this situation, I believe
it shouldn't try. I think it makes more sense that the application
prepares the file object for low-level operations. There are many other
Python modules that uses the .fileno() method, for example the select()
module, and as far as I understand, this module doesn't try to clean
streams or anything like that. 

To summarize: I'm leaning towards a documentation solution. 

----------------------------------------------------------------------

Comment By: lplatypus (ldeller)
Date: 2006-08-25 17:13

Message:
Logged In: YES 
user_id=1534394

I found the cause of this bug:

A libc FILE* (used by python file objects) may hold a
different file offset than the underlying OS file
descriptor.  The posix version of Popen._get_handles does
not take this into account, resulting in this bug.

The following patch against svn trunk fixes the problem.  I
don't have permission to attach files to this item, so I'll
have to paste the patch here:

Index: subprocess.py
===================================================================
--- subprocess.py       (revision 51581)
+++ subprocess.py       (working copy)
@@ -907,6 +907,12 @@
             else:
                 # Assuming file-like object
                 p2cread = stdin.fileno()
+                # OS file descriptor's file offset does not
necessarily match
+                # the file offset in the file-like object,
so do an lseek:
+                try:
+                    os.lseek(p2cread, stdin.tell(), 0)
+                except OSError:
+                    pass # file descriptor does not support
seek/tell

             if stdout is None:
                 pass
@@ -917,6 +923,12 @@
             else:
                 # Assuming file-like object
                 c2pwrite = stdout.fileno()
+                # OS file descriptor's file offset does not
necessarily match
+                # the file offset in the file-like object,
so do an lseek:
+                try:
+                    os.lseek(c2pwrite, stdout.tell(), 0)
+                except OSError:
+                    pass # file descriptor does not support
seek/tell

             if stderr is None:
                 pass
@@ -929,6 +941,12 @@
             else:
                 # Assuming file-like object
                 errwrite = stderr.fileno()
+                # OS file descriptor's file offset does not
necessarily match
+                # the file offset in the file-like object,
so do an lseek:
+                try:
+                    os.lseek(errwrite, stderr.tell(), 0)
+                except OSError:
+                    pass # file descriptor does not support
seek/tell

             return (p2cread, p2cwrite,
                     c2pread, c2pwrite,


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1546442&group_id=5470


More information about the Python-bugs-list mailing list