[Python-Dev] xreadlines : readlines :: xrange : range

Tim Peters tim.one@home.com
Tue, 9 Jan 2001 16:12:42 -0500


[Guido]
> I'm much more confident about the getc_unlocked() approach than about
> fgets() -- with the latter we need much more faith in the C library
> implementers.  (E.g. that fgets() never writes beyond the null bytes
> it promises, and that it locks/unlocks only once.)  Also, you're
> relying on blindingly fast memchr() and memset() implementations.

Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a
bit quicker on Solaris, despite that it's paying an extra layer of function
call per line, to keep it out of get_line proper).  That tells me the
assumptions are indeed mild.  The business about not writing beyond the null
byte is a concern only I would have raised:  the possibility is an
aggressively paranoid reading of the std (I do *lots* of things with libc
I'm paranoid about <0.9 wink>).  If even *Microsoft* didn't blow these
things, it's hard to imagine any other vendor exploding ...

Still, I'd rather get rid of ms_getline_hack if I could, because the code is
so much more complicated.

>> Both methods lack a refinement I would like to see, but can't
>> achieve in "the Windows way":  ensure that consistency is on no
>> worse than a per-line basis.  [Example omitted]

> The only portable way to ensure this that I can see, is to have a
> separate mutex in the Python file object.  Since this is hardly a
> common thing to do, I think it's better to let the application manage
> that lock if they need it.

Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the
file locked until the line was complete, and I wouldn't be opposed to making
life saner on platforms that allow it.  But there's another problem here:
part of the reason we release Python threads around the fgets is in case
some other thread is trying to write the data we're trying to read, yes?
But since FLOCKFILE is in effect, other threads *trying* to write to the
stream we're reading will get blocked anyway.  Seems to give us potential
for deadlocks.

> (Then why are we bothering with flockfile(), you may ask?

I wouldn't ask that, no <wink>.

> Because otherwise, accidental multithreaded reading from the same
> file could cause core dumps.)

Ugh ... turns out that on my box I can provoke core dumps anyway, with this
program.  Blows up under released 2.0 and CVS Pythons (so it's not due to
anything new):

import thread

def read(f):
    import time
    time.sleep(.01)
    n = 0
    while n < 1000000:
        x = f.readline()
        n += len(x)
        print "r",
    print "read " + `n`
    m.release()

m = thread.allocate_lock()
f = open("ga", "w+")
print "opened"
m.acquire()
thread.start_new_thread(read, (f,))
n = 0
x = "x" * 113 + "\n"
while n < 1000000:
    f.write(x)
    print "w",
    n += len(x)
m.acquire()
print "done"

Typical run:

C:\Python20>\code\python\dist\src\pcbuild\python temp.py
opened
w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w
w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w
w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w
w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w
w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w
w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w
w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w r w r
w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w
r r w r w r w r w r w r

and then it dies in msvcrt.dll with a bad pointer.  Also dies under the
debugger (yay!) ... always dies like so:

+ We (Python) call the MS fwrite, from fileobject.c file_write.
+ MS fwrite succeeds with its _lock_str(stream) call.
+ MS fwrite then calls MS _fwrite_lk.
+ MS _fwrite_lk calls memcpy, which blows up for a non-obvious reason.

Looks like the stream's _cnt member has gone mildly negative, which
_fwrite_lk casts to unsigned and so treats like a giant positive count, and
so memcpy eventually runs off the end of the process address space.

Only thing I can conclude from this is that MS's internal stream-locking
implementation is buggy.  At least on W98SE.  Other flavors of Windows?
Other platforms?

Note that I don't claim the program above is *sensible*, just that it
shouldn't blow up.  Alas, short of indeed adding a separate mutex in Python
file objects-- or writing our own stdio --I don't believe I can fix this.

the-best-thing-to-do-with-threads-is-don't-ly y'rs  - tim