[Python-Dev] xreadlines : readlines :: xrange : range

Tim Peters tim.one@home.com
Mon, 8 Jan 2001 23:29:02 -0500


[Andrew Kuchling]

I'll chop everything except while_readline (which is most affected by this
stuff):

> Linux: w/o USE_MS_GETLINE_HACK
> while_readline        0.184  0.180
>
> Linux w/ USE_MS_GETLINE_HACK:
> while_readline        0.183  0.190
>
> Solaris w/o USE_MS_GETLINE_HACK:
> while_readline        0.839  0.840
>
> Solaris w/ USE_MS_GETLINE_HACK:
> while_readline        0.769  0.770

So it's probably a wash.  In that case, do we want to maintain two hacks for
this?  I can't use the FLOCKFILE/etc approach on Windows, while "the
Windows" approach probably works everywhere (although its speed relies on
the platform factoring out at least the locking/unlocking in fgets).

Both methods lack a refinement I would like to see, but can't achieve in
"the Windows way":  ensure that consistency is on no worse than a per-line
basis.  Right now, both methods lock/unlock the file only for the extent of
the current buffer size, so that two threads *can* get back different
interleaved pieces of a single long line.  Like so:

import thread

def read(f):
    x = f.readline()
    print "thread saw " + `len(x)` + " chars"
    m.release()

f = open("ga", "w") # a file with one long line
f.write("x" * 100000 + "\n")
f.close()

m = thread.allocate_lock()
for i in range(10):
    print i
    f = open("ga", "r")
    m.acquire()
    thread.start_new_thread(read, (f,))
    x = f.readline()
    print "main saw " + `len(x)` + " chars"
    m.acquire(); m.release()
    f.close()

Here's a typical run on Windows (current CVS Python):

0
main saw 95439 chars
thread saw 4562 chars
1
main saw 97941 chars
thread saw 2060 chars
2
thread saw 43801 chars
main saw 56200 chars
3
thread saw 8011 chars
main saw 91990 chars
4
main saw 46546 chars
thread saw 53455 chars
5
thread saw 53125 chars
main saw 46876 chars
6
main saw 98638 chars
thread saw 1363 chars
7
main saw 72121 chars
thread saw 27880 chars
8
thread saw 70031 chars
main saw 29970 chars
9
thread saw 27555 chars
main saw 72446 chars

So, yes, it's threadsafe now:  between them, the threads always see a grand
total of 100001 characters.  But what friggin' good is that <wink>?  If,
e.g., Guido wants multiple threads to chew over his giant logfile, there's
no guarantee that .readline() ever returns an actual line from the file.

Not that Python 2.0 was any better in this respect ...