[Python-Dev] Why do we flush before truncating?

Sat Sep 6 23:40:16 EDT 2003

[Neil Schemenauer]
> The fflush call as been there forever. The truncate method was added
> in 2.36 by Guido.  I think the code was actually from Jim Roskind:
>
> http://groups.google.com/groups?selm=199412070213.SAA06932%40infoseek.com
>
> He says:
>
>   Note that since the underlying ftruncate operates on a file
>   descriptor (believe it or not), it was necessary to fflush() the
>   stream before performing the truncate.  I thought about doing a
>   seek() as well, but could not find compelling reason to move the
>   stream pointer.
>
> That still gives me no clue as to why the fflush() was deemed
> necessary.

Ack, I glossed over the fileno() call in our file_truncate().  It's usually
a Very Bad Idea to mix stream I/O and lower-level I/O operations without
flushing your pants off, but I'm having a hard time thinking of a specific
reason for doing so in the truncate case.  Better safe than trying to
out-think all possible implementations, though!

> I found this posting:
>
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=35E
> 0DB62.1BDD2D30%40taraz.kz&rnum=1
>
> but, AFACK, the reason the program is not working the way the poster
> expects is the missing fflush() call before the lseek() call (not
> the fflush() before the ftruncate()).

I think that's right.  In the Python case, I verified in a debugger that the
file position is 5 immediately before the fflush() call, and 10 immediately
after it.  It's surprising, but apparently OK by the C std.

[Guido
> ftruncate() is not a standard C function;

I suppose that clarifies my immediately preceding

>> ftruncate() isn't a standard C function,

<wink>?

> it's a standard Unix system call.

Yes, and I gave a link to the current POSIX/SUS ftruncate() specification.

> It works on a file descriptor (i.e. a small int), not on a
> stream (i.e. a FILE *).

Right, and I missed that, primarily because Windows doesn't have ftruncate()
so I wasn't looking at that part of the code.

> The fflush() call is necessary if the last call was a write, because in
> that case the stream's buffer may contain data that the OS file
> descriptor doesn't have yet.

I'm not really clear on why that should matter in the specific case of
truncating a file, but will just live with it.

> But ftruncate() is irrelevant, because on Windows, it is never called;
> there's a huge #ifdef MS_WINDOWS block containing Windows specific
> code ...

Right, I wrote that code.  Windows has no way to say "here's a file, change
the size to such-and-such"; the only way is to set the file pointer to the
desired size, and then call the no-argument Win32 SetEndOfFile(); Python
*used* to use the MS C _chsize() function, but that did insane things when
passed a "large" size; the SetEndOfFile() code was introduced as part of
fixing Python's Windows largefile support.

> ...
> It also looks like the MS_WINDOWS specific code block *does* attempt
> to record the current file position and seek back to it

Yes, because the file position must be changed on Windows in order to change
the file size, but *Python's* docs promise that file.truncate() doesn't
change the current position (which is natural behavior under POSIX
ftruncate() but strained on Windows).

> -- however it does this after fflush() has already messed with it.

Note that in the Windows test case, it's not simply that the current
position wasn't preserved across the file.truncate() call, it's also that
the file didn't change size.  It's very easy to fix the former while leaving
the latter broken.

> So perhaps moving the fflush() call into the #else part and doing
> something Windows-specific instead of calling fflush() to ensure the
> buffer is flushed inside the MS_WINDOWS part would be the right solution.
>
> I just realize that I have always worked under the assumption that
> fflush() after a read is a no-op; I just checked the 89 std and it
> says it is undefined.  (I must have picked up that misunderstanding
> from some platform-specific man page.)  This can be fixed by doing a
> ftell() followed by an fseek() call; this is required to flush the
> buffer if there was unwritten output data in the buffer, and is always
> allowed.

That's what I was hoping to avoid, but I don't care anymore:  after staring
it some more, I'm convinced that the current file_truncate() endures a
ridiculous amount of complexity trying to gain a tiny bit of speed in what
has to be a rare operation.