[Python-ideas] [Python-Dev] Ext4 data loss
Steven D'Aprano
steve at pearwood.info
Wed Mar 11 23:48:27 CET 2009
On Thu, 12 Mar 2009 01:21:25 am Antoine Pitrou wrote:
> Christian Heimes <lists <at> cheimes.de> writes:
> > In my initial proposal one and a half hour earlier I suggested
> > 'sync()' as the name of the method and 'synced' as the name of the
> > flag that forces a fsync() call during the close operation.
>
> I think your "synced" flag is too vague. Some applications may need
> the file to be synced on close(), but some others may need it to be
> synced at regular intervals, or after each write(), etc.
>
> Calling the flag "sync_on_close" would be much more explicit. Also,
> given the current API I think it should be an argument to open()
> rather than a writable attribute.
Perhaps we should have a module containing rich file tools, e.g. classes
FileSyncOnWrite, FileSyncOnClose, functions for common file-related
operations, etc. This will make it easy for conscientious programmers
to do the right thing for their app without needing to re-invent the
wheel all the time, but without handcuffing them into a single "one
size fits all" solution.
File operations are *hard*, because many error conditions are uncommon,
and consequently many (possibly even the majority) of programmers never
learn that something like this:
f = open('myfile', 'w')
f.write(data)
f.close()
(or the equivalent in whatever language they use) may cause data loss.
Worse, we train users to accept that data loss as normal instead of
reporting it as a bug -- possibly because it is unclear whether it is a
bug in the application, the OS, the file system, or all three. (It's
impossible to avoid *all* risk of data loss, of course -- what if the
computer loses power in the middle of a write? But we can minimize that
risk significantly.)
Even when programmers try to do the right thing, it is hard to know what
the right thing is: there are trade-offs to be made, and having made a
trade-off, the programmer then has to re-invent what usually turns out
to be a quite complicated wheel. To do the right thing in Python often
means delving into the world of os.O_* constants and file descriptors,
which is intimidating and unpythonic. They're great for those who
want/need them, but perhaps we should expose a Python interface to the
more common operations? To my mind, that means classes instead of magic
constants.
Would there be interest in a filetools module? Replies and discussion to
python-ideas please.
--
Steven D'Aprano
More information about the Python-ideas
mailing list