Re: [Python-ideas] [Python-Dev] Ext4 data loss
On Thu, 12 Mar 2009 01:21:25 am Antoine Pitrou wrote:
Christian Heimes <lists <at> cheimes.de> writes:
In my initial proposal one and a half hour earlier I suggested 'sync()' as the name of the method and 'synced' as the name of the flag that forces a fsync() call during the close operation.
I think your "synced" flag is too vague. Some applications may need the file to be synced on close(), but some others may need it to be synced at regular intervals, or after each write(), etc.
Calling the flag "sync_on_close" would be much more explicit. Also, given the current API I think it should be an argument to open() rather than a writable attribute.
Perhaps we should have a module containing rich file tools, e.g. classes FileSyncOnWrite, FileSyncOnClose, functions for common file-related operations, etc. This will make it easy for conscientious programmers to do the right thing for their app without needing to re-invent the wheel all the time, but without handcuffing them into a single "one size fits all" solution. File operations are *hard*, because many error conditions are uncommon, and consequently many (possibly even the majority) of programmers never learn that something like this: f = open('myfile', 'w') f.write(data) f.close() (or the equivalent in whatever language they use) may cause data loss. Worse, we train users to accept that data loss as normal instead of reporting it as a bug -- possibly because it is unclear whether it is a bug in the application, the OS, the file system, or all three. (It's impossible to avoid *all* risk of data loss, of course -- what if the computer loses power in the middle of a write? But we can minimize that risk significantly.) Even when programmers try to do the right thing, it is hard to know what the right thing is: there are trade-offs to be made, and having made a trade-off, the programmer then has to re-invent what usually turns out to be a quite complicated wheel. To do the right thing in Python often means delving into the world of os.O_* constants and file descriptors, which is intimidating and unpythonic. They're great for those who want/need them, but perhaps we should expose a Python interface to the more common operations? To my mind, that means classes instead of magic constants. Would there be interest in a filetools module? Replies and discussion to python-ideas please. -- Steven D'Aprano
On 2009-03-11 17:48, Steven D'Aprano wrote:
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
Yes, please. I am of the opinion that, wherever possible, these kinds of patterns should be codified in reusable libraries. For something as fundamental as writing files, something aimed towards standard library acceptance seems like a very good idea to me. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
I've been using and maintaining a few filesystem hacks for, let's see, almost nine years now: http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/fileutil.py (The first version of that was probably written by Greg Smith in about 1999.) I'm sure there are many other such packages. A couple of quick searches of pypi turned up these two: http://pypi.python.org/pypi/Pythonutils http://pypi.python.org/pypi/fs I wonder if any of them have the sort of functionality you're thinking of. Regards, Zooko
On Thu, 12 Mar 2009 12:26:40 pm zooko wrote:
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
I've been using and maintaining a few filesystem hacks for, let's see, almost nine years now:
http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/fileutil.py
(The first version of that was probably written by Greg Smith in about 1999.)
I'm sure there are many other such packages. A couple of quick searches of pypi turned up these two:
http://pypi.python.org/pypi/Pythonutils http://pypi.python.org/pypi/fs
I wonder if any of them have the sort of functionality you're thinking of.
Close, but not quite. I'm suggesting a module with a collection of subclasses of file that exhibit modified behaviour. For example: class FlushOnWrite(file): def write(self, data): super(FlushOnWrite, self).write(data) self.flush() # similarly for writelines class SyncOnWrite(FlushOnWrite): # ... class SyncOnClose(file): # ... plus functions which implement common idioms for safely writing data, making backups on a save, etc. A common idiom for safely over-writing a file while minimising the window of opportunity for file loss is: write to a temporary file and close it move the original to a backup location move the temporary file to where the original was if no errors, delete the backup although when I say "common" what I really mean is that it should be common, but probably isn't :-/ The sort of file handling that is complicated and tedious to get right, and so most developers don't bother, and those that do are re-inventing the wheel. There's a couple of recipes in the Python Cookbook which might be candidates. E.g. the first edition has recipes "Versioning Filenames" by Robin Parmar and "Module: Versioned Backups" by Mitch Chapman. What I DON'T mean is pathname utilities. Nor do I mean mini-applications that operate on files, like renaming file extensions, deleting files that meet some criterion, etc. I don't think they belong in the standard library, and even if they do, they don't belong in this proposed module. My intention is to offer a standard set of tools so people can choose the behaviour that suits their application best, rather than trying to make file() a one-size-fits-all solution. -- Steven D'Aprano
Le Thu, 12 Mar 2009 09:48:27 +1100, Steven D'Aprano <steve@pearwood.info> s'exprima ainsi:
Even when programmers try to do the right thing, it is hard to know what the right thing is: there are trade-offs to be made, and having made a trade-off, the programmer then has to re-invent what usually turns out to be a quite complicated wheel. To do the right thing in Python often means delving into the world of os.O_* constants and file descriptors, which is intimidating and unpythonic. They're great for those who want/need them, but perhaps we should expose a Python interface to the more common operations? To my mind, that means classes instead of magic constants.
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
Sure. +1 Also: a programmer is not (always) a filesystem expert. denis ------ la vita e estrany
Le Thu, 12 Mar 2009 09:48:27 +1100, Steven D'Aprano <steve@pearwood.info> s'exprima ainsi:
Even when programmers try to do the right thing, it is hard to know what the right thing is: there are trade-offs to be made, and having made a trade-off, the programmer then has to re-invent what usually turns out to be a quite complicated wheel. To do the right thing in Python often means delving into the world of os.O_* constants and file descriptors, which is intimidating and unpythonic. They're great for those who want/need them, but perhaps we should expose a Python interface to the more common operations? To my mind, that means classes instead of magic constants.
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
Sure. +1 Also: a programmer is not (always) a filesystem expert. PS: What I meant is: the point of view from the filesystem is very different. A proper interface will to have to take the programmer's point of view while exposing the filesystem issues. I think (like always at the interface of two worlds -- cf specification talks between developper and client ;-) *terminology* choices will be very important. denis ------ la vita e estrany
On Thu, 2009-03-12 at 08:24 +0100, spir wrote:
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
Sure. +1 Also: a programmer is not (always) a filesystem expert.
PS: What I meant is: the point of view from the filesystem is very different. A proper interface will to have to take the programmer's point of view while exposing the filesystem issues. I think (like always at the interface of two worlds -- cf specification talks between developper and client ;-) *terminology* choices will be very important.
Dealing with different types of OSes and filesystems in a generic way is difficult. I would urge everyone to err on the side of less generality, because I think it would be better for a programmer to write bad code, and be able to figure out why, than to write code that looks perfectly fine, and have a harder time discovering the problem. -- Cheers, Leif
participants (5)
-
Leif Walsh
-
Robert Kern
-
spir
-
Steven D'Aprano
-
zooko