Draft PEP to make file objects support non-blocking mode.
G'day,
the recent thread about thread semantics for file objects reminded me I
had a draft pep for extending file objects to support non-blocking
mode.
This is handy for handling files in async applications (the non-threaded
way of doing things concurrently).
Its pretty rough, but if I fuss over it any more I'll never get it
out...
--
Donovan Baarda
On 18 March 2005, Donovan Baarda said:
Rationale =========
Many Python library methods and classes like select.select(), os.popen2(), and subprocess.Popen() return and/or operate on builtin file objects. However even simple applications of these methods and classes require the files to be in non-blocking mode.
Currently the built in file type does not support non-blocking mode very well. Setting a file into non-blocking mode and reading or writing to it can only be done reliably by operating on the file.fileno() file descriptor. This requires using the fnctl and os module file descriptor manipulation methods.
Is having to use fcntl and os really so awful? At least it requires the programmer to prove he knows what he's doing putting this file into non-blocking mode, and that he really wants to do it. ;-)
Details =======
The documentation of file.read() warns; "Also note that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given". An empty string is returned to indicate an EOF condition. It is possible that file.read() in non-blocking mode will not produce any data before EOF is reached. Currently there is no documented way to identify the difference between reaching EOF and an empty non-blocking read.
The documented behaviour of file.write() in non-blocking mode is undefined. When writing to a file in non-blocking mode, it is possible that not all of the data gets written. Currently there is no documented way of handling or indicating a partial write.
That's more interesting and a better motivation for this PEP.
file.read([size]) Changes --------------------------
The read method's current behaviour needs to be documented, so its actual behaviour can be used to differentiate between an empty non-blocking read, and EOF. This means recording that IOError(EAGAIN) is raised for an empty non-blocking read.
file.write(str) Changes --------------------
The write method needs to have a useful behaviour for partial non-blocking writes defined, implemented, and documented. This includes returning how many bytes of "str" are successfully written, and raising IOError(EAGAIN) for an unsuccessful write (one that failed to write anything).
Proposing semantic changes to file.read() and write() is bound to
raise hackles. One idea for soothing such objections: only make these
changes active when setblocking(False) is in effect. I.e., a
setblocking(True) file (the default, right?) behaves as you described
above, warts and all. (So old code that uses fcntl() continues to
"work" as before.) But files that have had setblocking(False) called
could gain these new semantics that you propose.
Greg
--
Greg Ward
On Mar 18, 2005, at 8:19 PM, Greg Ward wrote:
Is having to use fcntl and os really so awful? At least it requires the programmer to prove he knows what he's doing putting this file into non-blocking mode, and that he really wants to do it. ;-)
I'd tend to agree. :) Moreover, I don't think fread/fwrite are guaranteed to work as you would expect with non-blocking file descriptors. So, providing a setblocking() call to files would require calling read/write instead of fread/fwrite in all the file methods, at least when in non-blocking mode. I don't think that's a good idea. James
On Fri, 2005-03-18 at 20:41 -0500, James Y Knight wrote:
On Mar 18, 2005, at 8:19 PM, Greg Ward wrote:
Is having to use fcntl and os really so awful? At least it requires the programmer to prove he knows what he's doing putting this file into non-blocking mode, and that he really wants to do it. ;-)
Consider the following. This is pretty much the only way you can use popen2 reliably without knowing specific behaviours of the executed command; import os,fnctl,select def process_data(cmd,data): child_in, child_out = os.popen2(cmd) child_in = child_in.fileno() # / flags = fcntl.fcntl(child_in, fcntl.F_GETFL) # |1) fcntl.fcntl(child_in, fcntl.F_SETFL, flags | os.O_NONBLOCK) # \ child_out = child_out.fileno() # / flags = fcntl.fcntl(child_out, fcntl.F_GETFL) # |2) fcntl.fcntl(child_out, fcntl.F_SETFL, flags | os.O_NONBLOCK)# \ ans = "" li = [child_out] lo = [child_in] while li or lo: i,o,e = select.select(li,lo,[]) # 3 if i: buf = os.read(child_out,2048) # 4 if buf: ans += buf else: li=[] if o: if data: count=os.write(child_in,data[:2048]) # 4 data = data[count:] else: lo=[] return ans For 1) and 2), note that popen2 returns file objects, but as they cannot be reliably used as file objects, we ignore them and grab their fileno(). Why does popen2 return file objects if they cannot reliably be used? The flags get/set using fnctl is arcane stuff for what is pretty much essential operations after a popen2. These could be replaced by; child_in.setblocking(False) child_out.setblocking(False) For 3), select() can operate on file objects directly. However, since you cannot reliably read/write file objects in non-blocking mode, we use the fileno's. Why can select operate with file objects if file objects cannot be reliably read/written? For 4), we are using the os.read/write methods to operate on the fileno's. Under the proposed PEP we could use the file objects read/write methods instead. I guess the thing that annoys me the most is the asymmetry of popen2 and select using file objects, but needing to use the os.read/write and fileno()'s for reading and writing.
I'd tend to agree. :) Moreover, I don't think fread/fwrite are guaranteed to work as you would expect with non-blocking file descriptors. So, providing a setblocking() call to files would require calling read/write instead of fread/fwrite in all the file methods, at least when in non-blocking mode. I don't think that's a good idea.
Hmm.. I assumed file.read() and file.write() were implemented using
read/write from their observed behaviour. The documentation of
fread/fwrite doesn't mention the behaviour in non-blocking mode at all.
The observed behaviour suggests that fread/fwrite are implemented using
read/write and hence get the same behaviour. The documentation implies
that the behaviour in non-blocking mode will reflect the behaviour of
read/write, with EAGAIN errors reported via ferror() indicating empty
non-blocking reads/writes.
If the behaviour of fread/fwrite is indeed indeterminate under
non-blocking mode, then yes, file objects in non-blocking mode would
have to use read/write instead of fread/fwrite. However, I don't think
this is required.
I know this PEP is kinda insignificant and minor. It doesn't save much,
but it doesn't change much, and makes things a bit cleaner.
--
Donovan Baarda
Donovan Baarda wrote:
Consider the following. This is pretty much the only way you can use popen2 reliably without knowing specific behaviours of the executed command;
... fcntl.fcntl(child_in, fcntl.F_SETFL, flags | os.O_NONBLOCK) # \ ... # / fcntl.fcntl(child_out, fcntl.F_SETFL, flags | os.O_NONBLOCK)# \
I still don't believe you need to make these non-blocking. When select() returns a fd for reading/writing, it's telling you that the next os.read/os.write call on it will not block. Making the fd non-blocking as well is unnecessary and perhaps even undesirable.
For 1) and 2), note that popen2 returns file objects, but as they cannot be reliably used as file objects, we ignore them and grab their fileno(). Why does popen2 return file objects if they cannot reliably be used?
I would go along with giving file objects alternative read/write methods which behave more like os.read/os.write, maybe called something like readsome() and writesome(). That would eliminate the need to extract and manipulate the fds, and might make it possible to do some of this stuff in a more platform-independent way. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
On Tue, 2005-03-22 at 12:49 +1200, Greg Ewing wrote:
Donovan Baarda wrote:
Consider the following. This is pretty much the only way you can use popen2 reliably without knowing specific behaviours of the executed command;
... fcntl.fcntl(child_in, fcntl.F_SETFL, flags | os.O_NONBLOCK) # \ ... # / fcntl.fcntl(child_out, fcntl.F_SETFL, flags | os.O_NONBLOCK)# \
I still don't believe you need to make these non-blocking. When select() returns a fd for reading/writing, it's telling you that the next os.read/os.write call on it will not block. Making the fd non-blocking as well is unnecessary and perhaps even undesirable.
Yeah... For some reason I had it in my head that os.read/os.write would not do partial/incomplete reads/writes unless the file was in non-blocking mode.
For 1) and 2), note that popen2 returns file objects, but as they cannot be reliably used as file objects, we ignore them and grab their fileno(). Why does popen2 return file objects if they cannot reliably be used?
I would go along with giving file objects alternative read/write methods which behave more like os.read/os.write, maybe called something like readsome() and writesome(). That would eliminate the need to extract and manipulate the fds, and might make it possible to do some of this stuff in a more platform-independent way.
The fact that partial reads/writes are possible without non-blocking
mode changes things a fair bit. Also, the lack of fnctl support in
Windows needs to be taken into account too.
I still think the support for partial reads in non-blocking mode on
file.read() is inconsistent with the absence of partial write support in
file.write(). I think this PEP still has some merit for cleaning up this
inconsistency, but otherwise doesn't gain much... just adding a return
count to file.write() and clearing up the documented behaviour is enough
to do this.
The lack of support on win32 for non-blocking mode, combined with the
reduced need for it, makes adding a "setblocking" method undesirable.
I don't know what the best thing to do now is... I guess the
readsome/writesome is probably best, but given that os.read/os.write is
not that bad, perhaps it's best to just forget I even suggested this
PEP :-)
--
Donovan Baarda
Donovan Baarda wrote:
The fact that partial reads/writes are possible without non-blocking mode changes things a fair bit. Also, the lack of fnctl support in Windows needs to be taken into account too.
... [ snip ] ...
The lack of support on win32 for non-blocking mode, combined with the reduced need for it, makes adding a "setblocking" method undesirable.
I don't know what the best thing to do now is... I guess the readsome/writesome is probably best, but given that os.read/os.write is not that bad, perhaps it's best to just forget I even suggested this PEP :-)
My EUR 0.01 is that we should aim at a higher level of abstraction. I really don't care Windows, Unix, Linux, WhateverOS provide me with a specific low level service. I care about the conceptual thing I'm trying to establish. Any abstraction that provides a means to express a solution more closely to that conceptual level is a winner. --eric
On 18 March 2005, Donovan Baarda said:
Many Python library methods and classes like select.select(), os.popen2(), and subprocess.Popen() return and/or operate on builtin file objects. However even simple applications of these methods and classes require the files to be in non-blocking mode.
I don't agree with that. There's no need to use non-blocking I/O when using select(), and in fact things are less confusing if you don't.
The read method's current behaviour needs to be documented, so its actual behaviour can be used to differentiate between an empty non-blocking read, and EOF. This means recording that IOError(EAGAIN) is raised for an empty non-blocking read.
Isn't that unix-specific? The file object is supposed to provide a more or less platform-independent interface, I thought. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
On Mon, 2005-03-21 at 17:32 +1200, Greg Ewing wrote:
On 18 March 2005, Donovan Baarda said:
Many Python library methods and classes like select.select(), os.popen2(), and subprocess.Popen() return and/or operate on builtin file objects. However even simple applications of these methods and classes require the files to be in non-blocking mode.
I don't agree with that. There's no need to use non-blocking I/O when using select(), and in fact things are less confusing if you don't.
You would think that... and the fact that select, popen2 etc all use file objects encourage you to think that. However, this is a trap that can catch you out badly. Check the attached python scripts that demonstrate the problem. Because staller.py outputs and flushes a fragment of data smaller than selector.py uses for its reads, the select statement is triggered, but the corresponding read blocks. A similar thing can happen with writes... if the child process consumes a fragment smaller than the write buffer of the selector process, then the select can trigger and the corresponding write can block because there is not enough space in the file buffer. The only ways to ensure that a select process does not block like this, without using non-blocking mode, are; 1) use a buffer size of 1 in the select process. 2) understand the child process's read/write behaviour and adjust the selector process accordingly... ie by making the buffer sizes just right for the child process, Note that it all interacts with the file objects buffer sizes too... making for some extremely hard to debug intermittent behaviour.
The read method's current behaviour needs to be documented, so its actual behaviour can be used to differentiate between an empty non-blocking read, and EOF. This means recording that IOError(EAGAIN) is raised for an empty non-blocking read.
Isn't that unix-specific? The file object is supposed to provide a more or less platform-independent interface, I thought.
I think the fread/fwrite and read/write behaviour is posix standard and
possibly C standard stuff... so it _should_ be the same on other
platforms.
--
Donovan Baarda
On Mon, 21 Mar 2005, Donovan Baarda wrote:
I don't agree with that. There's no need to use non-blocking I/O when using select(), and in fact things are less confusing if you don't.
You would think that... and the fact that select, popen2 etc all use file objects encourage you to think that. However, this is a trap that can catch you out badly. Check the attached python scripts that demonstrate the problem.
This is no "trap". When select() indicates that you can write or read, it means that you can write or read at least one byte. The .read() and .write() file methods, however, always writes and reads *everything*. These works, basically, just like fread()/fwrite().
The only ways to ensure that a select process does not block like this, without using non-blocking mode, are;
1) use a buffer size of 1 in the select process.
2) understand the child process's read/write behaviour and adjust the selector process accordingly... ie by making the buffer sizes just right for the child process,
3) Use os.read / os.write.
The read method's current behaviour needs to be documented, so its actual behaviour can be used to differentiate between an empty non-blocking read, and EOF. This means recording that IOError(EAGAIN) is raised for an empty non-blocking read.
Isn't that unix-specific? The file object is supposed to provide a more or less platform-independent interface, I thought.
I think the fread/fwrite and read/write behaviour is posix standard and possibly C standard stuff... so it _should_ be the same on other platforms.
Sorry if I've misunderstood your point, but fread()/fwrite() does not
return EAGAIN.
/Peter Åstrand
G'day,
From: "Peter Astrand"
On Mon, 21 Mar 2005, Donovan Baarda wrote: [...] This is no "trap". When select() indicates that you can write or read, it means that you can write or read at least one byte. The .read() and .write() file methods, however, always writes and reads *everything*. These works, basically, just like fread()/fwrite().
yep, which is why you can only use them reliably in a select loop if you read/write one byte at a time.
The only ways to ensure that a select process does not block like this, without using non-blocking mode, are;
1) use a buffer size of 1 in the select process.
2) understand the child process's read/write behaviour and adjust the selector process accordingly... ie by making the buffer sizes just right for the child process,
3) Use os.read / os.write. [...]
but os.read / os.write will block too. Try it... replace the file read/writes in selector.py. They will only do partial reads if the file is put into non-blocking mode.
I think the fread/fwrite and read/write behaviour is posix standard and possibly C standard stuff... so it _should_ be the same on other platforms.
Sorry if I've misunderstood your point, but fread()/fwrite() does not return EAGAIN.
no, fread()/fwrite() will return 0 if nothing was read/written, and ferror() will return EAGAIN to indicated that it was a "would block" condition.... at least I think it does... the man page simply says ferror() returns a non-zero value. Looking at the python implementation of file.read(), for an empty fread() where ferror() is non zero, it only raises IOError if errno is not EAGAIN or EWOULDBLOCK. It blindly clearerr()'s for any other partial read. The implementation of file.write() raises IOError whenever there is an incomplete write. So it looks, as I pointed out in the draft PEP, that the current file.read() supports non-blocking mode, but file.write() doesn't... a bit asymmetric :-) ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ----------------------------------------------------------------
On Mon, 21 Mar 2005, Donovan Baarda wrote:
The only ways to ensure that a select process does not block like this, without using non-blocking mode, are;
3) Use os.read / os.write. [...]
but os.read / os.write will block too.
No.
Try it... replace the file read/writes in selector.py. They will only do partial reads if the file is put into non-blocking mode.
I've just tried it; I replaced: data = o.read(BUFF_SIZE) with: data = os.read(o.fileno(), BUFF_SIZE) Works for me without any hangs. Another example is the subprocess module, which does not use non-blocking mode in any way. (If you are using pipes, however, you shouldn't write more than PIPE_BUF bytes in each write.)
I think the fread/fwrite and read/write behaviour is posix standard and possibly C standard stuff... so it _should_ be the same on other platforms.
Sorry if I've misunderstood your point, but fread()/fwrite() does not return EAGAIN.
no, fread()/fwrite() will return 0 if nothing was read/written, and ferror() will return EAGAIN to indicated that it was a "would block" condition.... at least I think it does... the man page simply says ferror() returns a non-zero value.
fread() should loop internally on EAGAIN, in blocking mode.
/Peter Åstrand
On Mon, 2005-03-21 at 11:42 +0100, Peter Astrand wrote:
On Mon, 21 Mar 2005, Donovan Baarda wrote:
The only ways to ensure that a select process does not block like this, without using non-blocking mode, are;
3) Use os.read / os.write. [...]
but os.read / os.write will block too.
No. [...]
Hmmm... you are right... that changes things. Blocking vs non-blocking becomes kinda moot if read/write will do partial writes in blocking mode.
fread() should loop internally on EAGAIN, in blocking mode.
Yeah, I was talking about non-blocking mode...
--
Donovan Baarda
On Mon, 2005-03-21 at 23:31 +1100, Donovan Baarda wrote:
On Mon, 2005-03-21 at 11:42 +0100, Peter Astrand wrote:
On Mon, 21 Mar 2005, Donovan Baarda wrote:
The only ways to ensure that a select process does not block like this, without using non-blocking mode, are;
3) Use os.read / os.write. [...]
but os.read / os.write will block too.
No. [...]
Hmmm... you are right... that changes things. Blocking vs non-blocking becomes kinda moot if read/write will do partial writes in blocking mode.
fread() should loop internally on EAGAIN, in blocking mode.
Yeah, I was talking about non-blocking mode...
Actually, in blocking mode you never get EAGAIN.... read() only gets
EAGAIN on an empty non-blocking read().
In non-blocking mode, EAGAIN is considered an error by fread(), so it
will return a partial read. The python implementation of file.read()
will return this partial read, and clear the EAGAIN error, or raise
IOError if it was an empty read (to differentiate between an empty read
and EOF).
--
Donovan Baarda
Donovan Baarda wrote:
On Mon, 2005-03-21 at 17:32 +1200, Greg Ewing wrote:
I don't agree with that. There's no need to use non-blocking I/O when using select(), and in fact things are less confusing if you don't.
Because staller.py outputs and flushes a fragment of data smaller than selector.py uses for its reads, the select statement is triggered, but the corresponding read blocks.
Your selector.py is using file object read/write methods, not os.read and os.write. I fully agree that you can't reliably use stdio-style I/O in conjunction with select(). But as long as you use os-level I/O, there shouldn't be any need to make anything non-blocking. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
On Mon, 21 Mar 2005 17:32:36 +1200, Greg Ewing
On 18 March 2005, Donovan Baarda said:
The read method's current behaviour needs to be documented, so its actual behaviour can be used to differentiate between an empty non-blocking read, and EOF. This means recording that IOError(EAGAIN) is raised for an empty non-blocking read.
Isn't that unix-specific? The file object is supposed to provide a more or less platform-independent interface, I thought.
The whole thing is, I believe, highly Unix-specific. I say this because I am essentially a Windows programmer, and the proposal means almost nothing to me :-) More seriously, non-blocking IO and select-type readability checks are VERY different on Windows, and so I would expect the implementation of this chance to be completely different as well. The C standard says nothing about non-blocking IO. While POSIX might, that doesn't apply to Windows. Oh, and in case it's not obvious, I'm -1 on something "Unix-only" here. Python file objects are supposed to be cross-platform, in general. Paul. PS Donovan's sample code seems to be process-related - if so, isn't that what the new subprocess module was supposed to resolve (process-related communication in a platform-independent way)? If the only use case is with subprocesses, then is this change needed at all?
G'day,
From: "Greg Ward"
On 18 March 2005, Donovan Baarda said: [...]
Currently the built in file type does not support non-blocking mode very well. Setting a file into non-blocking mode and reading or writing to it can only be done reliably by operating on the file.fileno() file descriptor. This requires using the fnctl and os module file descriptor manipulation methods.
Is having to use fcntl and os really so awful? At least it requires the programmer to prove he knows what he's doing putting this file into non-blocking mode, and that he really wants to do it. ;-)
It's not that bad I guess... but then I'm proposing a very minor change to fix it. The bit that annoys me is popen2() and select() give this false sense of "File Object compatability", when in reality you can't use them reliably with file objects. It is also kind of disturbing that file.read() actually does work in non-blocking mode, but file.write() doesn't. The source for file.read() shows a fair bit of effort towards making it work for non-blocking mode... why not do the same for file.write()?
Details =======
The documentation of file.read() warns; "Also note that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given". An empty string is returned to indicate an EOF condition. It is possible that file.read() in non-blocking mode will not produce any data before EOF is reached. Currently there is no documented way to identify the difference between reaching EOF and an empty non-blocking read.
The documented behaviour of file.write() in non-blocking mode is undefined. When writing to a file in non-blocking mode, it is possible that not all of the data gets written. Currently there is no documented way of handling or indicating a partial write.
That's more interesting and a better motivation for this PEP.
The other solution to this of course is to simply say "file.read() and file.write() don't work in non-blocking mode", but that would be a step backwards for the current file.read().
file.read([size]) Changes --------------------------
The read method's current behaviour needs to be documented, so its actual behaviour can be used to differentiate between an empty non-blocking read, and EOF. This means recording that IOError(EAGAIN) is raised for an empty non-blocking read.
file.write(str) Changes --------------------
The write method needs to have a useful behaviour for partial non-blocking writes defined, implemented, and documented. This includes returning how many bytes of "str" are successfully written, and raising IOError(EAGAIN) for an unsuccessful write (one that failed to write anything).
Proposing semantic changes to file.read() and write() is bound to raise hackles. One idea for soothing such objections: only make these changes active when setblocking(False) is in effect. I.e., a setblocking(True) file (the default, right?) behaves as you described above, warts and all. (So old code that uses fcntl() continues to "work" as before.) But files that have had setblocking(False) called could gain these new semantics that you propose.
There is nothing in this proposal that would break or change the behaviour of any existing code, unless it was relying on file.write() returning None. or checking that file objects don't have a "setblocking" method. Note that the change for file.read() is simply to document the current behaviour... not to actually change it. The change for file.write() is a little more dramatic, but I really can't imagine anyone relying on file.write() returning None. A compromise would be to have file.write() return None in blocking mode, and a count in non-blocking mode... but I still can't believe people will rely on it returning None :-) It would be more useful to always return a count, so that methods using them could handle both modes easily. Note that I did consider some more dramatic changes that would have made them even easier to use. Things like raising an exception for EOF instead of EAGAIN would actually make a lot of things easier to code... but it would be too big a change. ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ----------------------------------------------------------------
participants (7)
-
Donovan Baarda
-
Eric Nieuwland
-
Greg Ewing
-
Greg Ward
-
James Y Knight
-
Paul Moore
-
Peter Astrand