
[Dan Sugalski, on Wednesday, August 01, 2001 2:20 AM]
... Ouch, I'd bet that hurts. Has anyone timed the difference between making lots of getc calls and making a few larger reads and managing the buffers internally? I can see it going either way, and another data point would be useful to have.
There's been lots of this in Python-Dev, like in this thread: http://aspn.activestate.com/ASPN/Mail/Message/600485 I'll quote the high-order bit: My line-at-a-time test case used (rounding to nearest whole integers) 30 seconds in Python and 6 in Perl. The result of testing many changes to Python's implementation was that the excess 24 seconds broke down like so: 17 spent inside internal MS threadsafe getc() lock/unlock routines 5 uncertain, but evidence suggests much of it due to MS malloc/realloc (Perl does its own memory mgmt) 2 for not copying directly out of the platform FILE* implementation struct in a highly optimized loop (like Perl does) My last checkin to fileobject.c reclaimed 17 seconds on Win98SE while remaining threadsafe, via a combination of locking per line instead of per character, and invoking realloc much less often (only for lines exceeding 200 chars). Note that thread overhead is overwhelmingly the biggest hangup. Python has two threadsafe input tricks now: 1. On platforms that have flockfile(), funlockfile(), and getc_unlocked(), the last is used in a loop bracketed by the first two. 2. At least on Windows, which doesn't have those, we use the platform fgets() in an excruciating way, tricking it into letting us read lines with embedded null bytes. Oddly enough, in the timing reports I saw, approach #1 was never faster than approach #2, and on at least one platform (Tru64, IIRC) was slower. Of course fgets() is a primitive in std C because they *wanted* to make it possible for vendors to optimize it (in the ways Perl does), but it appears very few vendors do optimize it. On Windows it's the same old getc()-in-a-loop, but they lock/unlock the stream only once per fgets call (using internal stream functions that aren't exposed). The "2 seconds for not copying directly ... like Perl does" I reported above came from hacking together a thread-unsafe line input routine that used the same FILE* tricks Perl uses. That is, thread-unsafe getc-in-a-loop was 2 seconds slower than using thread-unsafe FILE* tricks. That's significant in absolute terms, but was lost in the noise compared to the other stuff we were fighting.