[Python-Dev] Waiting method for file objects

Eric S. Raymond esr@thyrsus.com
Thu, 25 Jan 2001 11:19:36 -0500

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I have been researching the question of how to ask a file descriptor how much
data it has waiting for the next sequential read, with a view to discovering
what cross-platform behavior we could count on for a hypothetical `waiting'
method in Python's built-in file class.

1:  Why bother?

I have these main applications in mind:

1. Detecting EOF on a static plain file.
2. Non-blocking poll of a socket opened in non-blocking mode.
3. Non-blocking poll of a FIFO opened in non-blocking mode.
4. Non-blocking poll of a terminal device opened in non-blocking mode.

These are all frequently requested capabilities on C newsgroups -- how
often have *you* seen the "how do I detect an individual keypress"
question from beginning programmers?  I believe having these
capabilities would substantially enhance Python's appeal.

2: What would be under the hood?

Summary: We can do this portably, and we can do it with only one (1)
new #ifdef.  Our tools for this purpose will be the fstat(2) st_size
field and the FIONREAD ioctl(2) call.  They are complementary.

In all supposedly POSIX-conformant environments I know of, the st_size
field has a documented meaning for plain files (S_IFREG) and may or
may not give a meaningful number for FIFOs, sockets, and tty devices.
The Single Unix Specification is silent on the meaning of st_size for
file types other than regular files (S_IFREG).  I have filed a defect
report about this with OpenGroup and am discussing appropriate language
with them.

(The last sentence of the Inferno operating system's language on
stat(2) is interesting: "If the file resides on permanent storage and
is not a directory, the length returned by stat is the number of bytes
in the file. For directories, the length returned is zero. Some
devices report a length that is the number of bytes that may be read
from the device without blocking.")

The FIONREAD ioctl(2) call, on the other hand, returns bytes waiting
on character devices such as FIFOs, sockets, or ttys -- but does not
return a useful value for files or directories or block devices. The
FIONREAD ioctl was supported in both SVr4 and 4.2BSD.  It's present in
all the open-source Unixes, SunOS, Solaris, and AIX.  Via Google
search I have discovered that it's also supported in the Windows
Sockets API and the GUSI POSIX libraries for the Macintosh.  Thus, it
can be considered portable for Python's purposes even though it's
rather sparsely documented.

I was able to obtain confirming information on Linux from Linus
Torvalds himself. My information on Windows and the Mac is from
Gavriel State, formerly a lead developer on Corel's WINE team and a
programmer with extensive cross-platform experience.  Gavriel reported
on the MSCRT POSIX environment, on the Metrowerks Standard Library
POSIX implementation for the Mac, and on the GUSI POSIX implementation
for the Mac.

2.1: Plain files

Torvalds and State confirm that for plain files (S_IFREG) the st_size
field is reliable on all three platforms.  On the Mac it gives the
file's data fork size.

One apparent difficulty with the plain-file case is that POSIX does
not guarantee anything about seek_t quantities such as lseek(2)
returns and the st_size field except that they can be compared for
equality.  Thus, under the strict letter of POSIX law, `waiting' can
be used to detect EOF but not to get a reliable read-size return in
any other file position.

Fortunately, this is less an issue than it appears.  The weakness of
the POSIX language was a 1980s-era concession to a generation of
mainframe operating systems with record-oriented file structures --
all of which are now either thoroughly obsolete or (in the case of IBM
VM/CMS) have become Linux emulators :-).  On modern operating systems
under which files have character granularity, stat(2) emulations can
be and are written to give the right result.

2.2: Block devices

The directory case (S_IFDIR) is a complete loss.  Under Unixes,
including Linux, the fstat(2) size field gives the allocated size of
the directory as if it were a plain file.  Under MSCRT POSIX the
meaning is undocumented and unclear.  Metroworks returns garbage.
GUSI POSIX returns the number of files in the directory!  FIONREAD
cannot be used on directories.

Block devices (S_IFBLK) are a mess again.  Linus points out that a
system with removable or unmountable volumes *cannot* return a useful
st_size field -- what happens when the device is dismounted?

2.3: Character devices

Pipes and FIFOs (S_IFIFO) look better.  On MSCRT the fstat(2) size
field returns the number of bytes waiting to be read.  This is also
true under current Linuxes, though Torvalds says it is "an
implementation detail" and recommends polling with the FIONREAD ioctl
instead.  Fortunately, FIONREAD is available under Unix, Windows, and
the Mac.

Sockets (S_IFSOCK) look better too.  Under Linux, the fstat(2) size
field gives number of bytes waiting.  Torvalds again says this is "an
implementation detail" and recommends polling with the FIONREAD ioctl.
Neither MSCRT POSIX nor Metroworks has direct support for sockets.
GUSI POSIX returns 1 (!) in the st_size field. But FIONREAD is
available under Unix, Windows, and the GUSI POSIX libraries on the

Character devices (S_IFCHR) can be polled with FIONREAD.  This technique
has a long history of use with tty devices under Unix.  I don't know whether
it will work with the equivalents of terminal devices for Windows and the Mac.
Fortunately this is not a very important question, as those are GUI 
environments with the terminal devices are rarely if ever used.

3. How does this turn into Python?

The upshot of our portability analysis is that by using FIONREAD and
fstat(2), we can get useful results for plain files, pipes, and
sockets on all three platforms.  Directories and block devices are a
complete loss.  Character devices (in particular, ttys) we can poll
reliably under Unix.  What we'll get polling the equivalents of tty or
character devices under Windows and the Mac is presently unknown, but
also unimportant.

My proposed semantics for a Python `waiting' method is that it reports
the amount of data that would be returned by a read() call at the time
of the waiting-method invocation.  The interpreter throws OSError if
such a report is impossible or forbidden.

I have enclosed a patch against the current CVS sources, including
documentation.  This patch is tested and working against plain files,
sockets, and FIFOs under Linux.  I have also attached the
Python test program I used under Linux.

I would appreciate it if those of you on Windows and Macintosh
machines would test the waiting method. The test program will take
some porting, because it needs to write to a FIFO in background.
Under Linux I do it this way:

	(echo -n '%s' >testfifo; echo 'Data written to FIFO.') &

I don't know how to do the equivalent under Windows or Mac.

When you run this program, it will try to mail me your test results.
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Sometimes it is said that man cannot be trusted with the government
of himself.  Can he, then, be trusted with the government of others?
	-- Thomas Jefferson, in his 1801 inaugural address

Content-Type: text/plain; charset=us-ascii
Content-Description: Patch implementing the waiting method
Content-Disposition: attachment; filename="waiting.patch"

Index: fileobject.c
RCS file: /cvsroot/python/python/dist/src/Objects/fileobject.c,v
retrieving revision 2.108
diff -c -r2.108 fileobject.c
*** fileobject.c	2001/01/18 03:03:16	2.108
--- fileobject.c	2001/01/25 16:16:10
*** 35,40 ****
--- 35,44 ----
  #include <errno.h>
+ #include <sys/ioctl.h>
+ #endif
  typedef struct {
*** 423,428 ****
--- 427,513 ----
  static PyObject *
+ file_waiting(PyFileObject *f, PyObject *args)
+ {
+ 	struct stat stbuf;
+ #ifdef HAVE_FSTAT
+ 	int ret;
+ #endif
+ 	if (f->f_fp == NULL)
+ 		return err_closed();
+ 	if (!PyArg_NoArgs(args))
+ 		return NULL;
+ #ifndef HAVE_FSTAT
+ 	PyErr_SetString(PyExc_OSError, "fstat(2) is not available.");
+ 	clearerr(f->f_fp);
+ 	return NULL;
+ #else
+ 	errno = 0;
+ 	ret = fstat(fileno(f->f_fp), &stbuf);
+ 	    if (ret == -1) {			/* the fstat failed */
+ 		PyErr_SetFromErrno(PyExc_IOError);
+ 		clearerr(f->f_fp);
+ 		return NULL;
+        	} else if (S_ISDIR(stbuf.st_mode) || S_ISBLK(stbuf.st_mode)) {
+ 		PyErr_SetString(PyExc_IOError, 
+ 				"Can't poll a block device or directory.");
+ 		clearerr(f->f_fp);
+ 		return NULL;
+ 	} else if (S_ISREG(stbuf.st_mode)) {	/* plain file */
+ 		fpos_t pos;
+ #else
+ 		off_t pos;
+ #endif
+ 		errno = 0;
+ 		pos = _portable_ftell(f->f_fp);
+ 		if (pos == -1) {
+ 			PyErr_SetFromErrno(PyExc_IOError);
+ 			clearerr(f->f_fp);
+ 			return NULL;
+ 		}
+ 		return PyInt_FromLong(stbuf.st_size - pos);
+ #else
+ 		return PyLong_FromLongLong(stbuf.st_size - pos);
+ #endif
+ 	} else if (S_ISFIFO(stbuf.st_mode) 
+ 		    || S_ISSOCK(stbuf.st_mode) 
+ 		    || S_ISCHR(stbuf.st_mode)) {	/* stream device */
+ #ifndef FIONREAD
+ 		PyErr_SetString(PyExc_OSError, 
+ 				"FIONREAD is not available.");
+ 		clearerr(f->f_fp);
+ 		return NULL;
+ #else
+ 		int waiting;
+ 		errno = 0;
+ 		ret = ioctl(fileno(f->f_fp), FIONREAD, &waiting);
+ 		if (ret == -1) {
+ 			PyErr_SetFromErrno(PyExc_IOError);
+ 			clearerr(f->f_fp);
+ 			return NULL;
+ 		}
+ 		return Py_BuildValue("i", waiting);
+ #endif /* FIONREAD */
+ 	} else {				/* should never happen! */
+ 		PyErr_SetString(PyExc_OSError, "Unknown file type.");
+ 		clearerr(f->f_fp);
+ 		return NULL;
+ 	}
+ #endif /* HAVE_FSTAT */
+ }
+ static PyObject *
  file_fileno(PyFileObject *f, PyObject *args)
  	if (f->f_fp == NULL)
*** 1263,1268 ****
--- 1348,1354 ----
  	{"truncate",	(PyCFunction)file_truncate, 1},
  	{"tell",	(PyCFunction)file_tell, 0},
+ 	{"waiting",	(PyCFunction)file_waiting, 0},
  	{"readinto",	(PyCFunction)file_readinto, 0},
  	{"readlines",	(PyCFunction)file_readlines, 1},
  	{"xreadlines",	(PyCFunction)file_xreadlines, 1},

Content-Type: text/plain; charset=us-ascii
Content-Description: Test program for the waiting method
Content-Disposition: attachment; filename="waiting_test.py"

#!/usr/bin/env python
import sys, os, random, string, time, socket, smtplib, readline

print "This program tests the `waiting' method of file objects."

fp = open("waiting_test.py")
if hasattr(fp, "waiting"):
    print "Good, you're running a patched Python with `waiting' available."
    print "You haven't installed the `waiting' patch yet.  This won't work."

successes = ""
failures = ""
nogo = ""

print ""
print "First, plain files:"

filesize = fp.waiting()
print "There are %d bytes waiting to be read in this file." % filesize
if os.name == 'posix':
    os.system("ls -l waiting_test.py")
    print "That should match the number in the ls listing above."
    print "Please check this with your OS's directory tools."

get = random.randrange(fp.waiting())
print "I'll now read a random number (%d) of bytes." % get
print "The waiting method sees %d bytes left." % fp.waiting()
if get + fp.waiting() == filesize:
    print  "%d + %d = %d.  That's consistent.  Test passed." % \
          (get, fp.waiting(), filesize)
    successes += "Plain file random-read test passed.\n"
    print "That's not consistent. Test failed."
    failures += "Plain file random-read test failed\n"

print "Now let's see if we can detect EOF reliably."
left = fp.waiting()
print "I'll do a read()...the waiting method now returns %d" % left
if left == 0:
    print "That looks like EOF."
    successes += "Plain file EOF test passed.\n"
    print "%d bytes left. Test failed." % left
    failures += "Plain file EOF test failed\n"

print ""
print "Now sockets:"
print "Connecting to imap.netaxs.com's IMAP server now..."
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
file = sock.makefile('rb')
sock.connect(("imap.netaxs.com", 143))
print "Waiting a few seconds to avoid a race condition..."
greetsize = file.waiting()
print "There appear to be %d bytes waiting..." % greetsize
greeting = file.readline()
print "I just read the greeting line..."
if len(greeting) == greetsize:
    print "...and the size matches.  Test passed."
    successes += "Socket test passed.\n"
    print "That's not right.  Test failed."
    failures += "Socket test failed.\n"

print ""
if not hasattr(os, "mkfifo"):
    print "Your platform doesn't have FIFOs (mkfifo() is absent), so I can't test them."
    nogo = "FIFO test could not be performed."
    print "Now FIFOs:"
    print "I'm making a FIFO named testfifo."; os.mkfifo("testfifo")
    str = string.letters[:random.randrange(len(string.letters))]
    print "I'm going to send it the following string '%s' of random length %d:" \
          % (str, len(str),)
    # Note: Unix dependency here!
    os.system("(echo -n '%s' >testfifo; echo 'Data written to FIFO.') &" % str)
    fp = open("testfifo", "r")
    print "Waiting a few seconds to avoid a race condition..."
    ready = fp.waiting()
    print "I see %d bytes waiting in the FIFO." % ready
    if ready == len(str):
        print "That's consistent.  Test passed."
        successes += "FIFO test passed.\n"
        print "That's not consistent. Test failed."
        failures += "FIFO test failed\n"

print "\nSummary:"
report = "Platform is: %s, version is %s\n" % (sys.platform, sys.version)
if successes:
    report += "The following tests succeeded:\n" + successes
if failures:
    report += "The following tests failed:\n" + failures
if nogo:
    report += "The following tests could not be performed:\n" + nogo
if not nogo:
    report += "No tests were skipped.\n"
if not failures:
    report += "All tests succeeded.\n"
print report

if os.name == 'posix':
    me = os.environ["USER"] + "@" + socket.getfqdn()
    me = raw_input("Enter your emasil address, please?")

    server = smtplib.SMTP('localhost')
    report = ("From: %s\nTo: esr@thyrsus.com\nSubject: waiting_test\n\n" % me) + report
    server.sendmail(me, ["esr@thyrsus.com"], report)
    print "The attempt to mail your test result failed.\n"