[Python-ideas] [Python-Dev] Ext4 data loss

Steven D'Aprano steve at pearwood.info
Wed Mar 11 23:48:27 CET 2009


On Thu, 12 Mar 2009 01:21:25 am Antoine Pitrou wrote:
> Christian Heimes <lists <at> cheimes.de> writes:
> > In my initial proposal one and a half hour earlier I suggested
> > 'sync()' as the name of the method and 'synced' as the name of the
> > flag that forces a fsync() call during the close operation.
>
> I think your "synced" flag is too vague. Some applications may need
> the file to be synced on close(), but some others may need it to be
> synced at regular intervals, or after each write(), etc.
>
> Calling the flag "sync_on_close" would be much more explicit. Also,
> given the current API I think it should be an argument to open()
> rather than a writable attribute.

Perhaps we should have a module containing rich file tools, e.g. classes 
FileSyncOnWrite, FileSyncOnClose, functions for common file-related 
operations, etc. This will make it easy for conscientious programmers 
to do the right thing for their app without needing to re-invent the 
wheel all the time, but without handcuffing them into a single "one 
size fits all" solution.

File operations are *hard*, because many error conditions are uncommon, 
and consequently many (possibly even the majority) of programmers never 
learn that something like this:

f = open('myfile', 'w')
f.write(data)
f.close()

(or the equivalent in whatever language they use) may cause data loss. 
Worse, we train users to accept that data loss as normal instead of 
reporting it as a bug -- possibly because it is unclear whether it is a 
bug in the application, the OS, the file system, or all three. (It's 
impossible to avoid *all* risk of data loss, of course -- what if the 
computer loses power in the middle of a write? But we can minimize that 
risk significantly.)

Even when programmers try to do the right thing, it is hard to know what 
the right thing is: there are trade-offs to be made, and having made a 
trade-off, the programmer then has to re-invent what usually turns out 
to be a quite complicated wheel. To do the right thing in Python often 
means delving into the world of os.O_* constants and file descriptors, 
which is intimidating and unpythonic. They're great for those who 
want/need them, but perhaps we should expose a Python interface to the 
more common operations? To my mind, that means classes instead of magic 
constants.

Would there be interest in a filetools module? Replies and discussion to 
python-ideas please.


-- 
Steven D'Aprano



More information about the Python-ideas mailing list