Re: [Python-ideas] [Python-Dev] Ext4 data loss

On Thu, 12 Mar 2009 01:21:25 am Antoine Pitrou wrote:
Perhaps we should have a module containing rich file tools, e.g. classes FileSyncOnWrite, FileSyncOnClose, functions for common file-related operations, etc. This will make it easy for conscientious programmers to do the right thing for their app without needing to re-invent the wheel all the time, but without handcuffing them into a single "one size fits all" solution. File operations are *hard*, because many error conditions are uncommon, and consequently many (possibly even the majority) of programmers never learn that something like this: f = open('myfile', 'w') f.write(data) f.close() (or the equivalent in whatever language they use) may cause data loss. Worse, we train users to accept that data loss as normal instead of reporting it as a bug -- possibly because it is unclear whether it is a bug in the application, the OS, the file system, or all three. (It's impossible to avoid *all* risk of data loss, of course -- what if the computer loses power in the middle of a write? But we can minimize that risk significantly.) Even when programmers try to do the right thing, it is hard to know what the right thing is: there are trade-offs to be made, and having made a trade-off, the programmer then has to re-invent what usually turns out to be a quite complicated wheel. To do the right thing in Python often means delving into the world of os.O_* constants and file descriptors, which is intimidating and unpythonic. They're great for those who want/need them, but perhaps we should expose a Python interface to the more common operations? To my mind, that means classes instead of magic constants. Would there be interest in a filetools module? Replies and discussion to python-ideas please. -- Steven D'Aprano

On 2009-03-11 17:48, Steven D'Aprano wrote:
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
Yes, please. I am of the opinion that, wherever possible, these kinds of patterns should be codified in reusable libraries. For something as fundamental as writing files, something aimed towards standard library acceptance seems like a very good idea to me. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Would there be interest in a filetools module? Replies and discussion to python-ideas please.
I've been using and maintaining a few filesystem hacks for, let's see, almost nine years now: http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/fileutil.py (The first version of that was probably written by Greg Smith in about 1999.) I'm sure there are many other such packages. A couple of quick searches of pypi turned up these two: http://pypi.python.org/pypi/Pythonutils http://pypi.python.org/pypi/fs I wonder if any of them have the sort of functionality you're thinking of. Regards, Zooko

On Thu, 12 Mar 2009 12:26:40 pm zooko wrote:
Close, but not quite. I'm suggesting a module with a collection of subclasses of file that exhibit modified behaviour. For example: class FlushOnWrite(file): def write(self, data): super(FlushOnWrite, self).write(data) self.flush() # similarly for writelines class SyncOnWrite(FlushOnWrite): # ... class SyncOnClose(file): # ... plus functions which implement common idioms for safely writing data, making backups on a save, etc. A common idiom for safely over-writing a file while minimising the window of opportunity for file loss is: write to a temporary file and close it move the original to a backup location move the temporary file to where the original was if no errors, delete the backup although when I say "common" what I really mean is that it should be common, but probably isn't :-/ The sort of file handling that is complicated and tedious to get right, and so most developers don't bother, and those that do are re-inventing the wheel. There's a couple of recipes in the Python Cookbook which might be candidates. E.g. the first edition has recipes "Versioning Filenames" by Robin Parmar and "Module: Versioned Backups" by Mitch Chapman. What I DON'T mean is pathname utilities. Nor do I mean mini-applications that operate on files, like renaming file extensions, deleting files that meet some criterion, etc. I don't think they belong in the standard library, and even if they do, they don't belong in this proposed module. My intention is to offer a standard set of tools so people can choose the behaviour that suits their application best, rather than trying to make file() a one-size-fits-all solution. -- Steven D'Aprano

Le Thu, 12 Mar 2009 09:48:27 +1100, Steven D'Aprano <steve@pearwood.info> s'exprima ainsi:
Sure. +1 Also: a programmer is not (always) a filesystem expert. PS: What I meant is: the point of view from the filesystem is very different. A proper interface will to have to take the programmer's point of view while exposing the filesystem issues. I think (like always at the interface of two worlds -- cf specification talks between developper and client ;-) *terminology* choices will be very important. denis ------ la vita e estrany

On Thu, 2009-03-12 at 08:24 +0100, spir wrote:
Dealing with different types of OSes and filesystems in a generic way is difficult. I would urge everyone to err on the side of less generality, because I think it would be better for a programmer to write bad code, and be able to figure out why, than to write code that looks perfectly fine, and have a harder time discovering the problem. -- Cheers, Leif

On 2009-03-11 17:48, Steven D'Aprano wrote:
Would there be interest in a filetools module? Replies and discussion to python-ideas please.
Yes, please. I am of the opinion that, wherever possible, these kinds of patterns should be codified in reusable libraries. For something as fundamental as writing files, something aimed towards standard library acceptance seems like a very good idea to me. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Would there be interest in a filetools module? Replies and discussion to python-ideas please.
I've been using and maintaining a few filesystem hacks for, let's see, almost nine years now: http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/fileutil.py (The first version of that was probably written by Greg Smith in about 1999.) I'm sure there are many other such packages. A couple of quick searches of pypi turned up these two: http://pypi.python.org/pypi/Pythonutils http://pypi.python.org/pypi/fs I wonder if any of them have the sort of functionality you're thinking of. Regards, Zooko

On Thu, 12 Mar 2009 12:26:40 pm zooko wrote:
Close, but not quite. I'm suggesting a module with a collection of subclasses of file that exhibit modified behaviour. For example: class FlushOnWrite(file): def write(self, data): super(FlushOnWrite, self).write(data) self.flush() # similarly for writelines class SyncOnWrite(FlushOnWrite): # ... class SyncOnClose(file): # ... plus functions which implement common idioms for safely writing data, making backups on a save, etc. A common idiom for safely over-writing a file while minimising the window of opportunity for file loss is: write to a temporary file and close it move the original to a backup location move the temporary file to where the original was if no errors, delete the backup although when I say "common" what I really mean is that it should be common, but probably isn't :-/ The sort of file handling that is complicated and tedious to get right, and so most developers don't bother, and those that do are re-inventing the wheel. There's a couple of recipes in the Python Cookbook which might be candidates. E.g. the first edition has recipes "Versioning Filenames" by Robin Parmar and "Module: Versioned Backups" by Mitch Chapman. What I DON'T mean is pathname utilities. Nor do I mean mini-applications that operate on files, like renaming file extensions, deleting files that meet some criterion, etc. I don't think they belong in the standard library, and even if they do, they don't belong in this proposed module. My intention is to offer a standard set of tools so people can choose the behaviour that suits their application best, rather than trying to make file() a one-size-fits-all solution. -- Steven D'Aprano

Le Thu, 12 Mar 2009 09:48:27 +1100, Steven D'Aprano <steve@pearwood.info> s'exprima ainsi:
Sure. +1 Also: a programmer is not (always) a filesystem expert. PS: What I meant is: the point of view from the filesystem is very different. A proper interface will to have to take the programmer's point of view while exposing the filesystem issues. I think (like always at the interface of two worlds -- cf specification talks between developper and client ;-) *terminology* choices will be very important. denis ------ la vita e estrany

On Thu, 2009-03-12 at 08:24 +0100, spir wrote:
Dealing with different types of OSes and filesystems in a generic way is difficult. I would urge everyone to err on the side of less generality, because I think it would be better for a programmer to write bad code, and be able to figure out why, than to write code that looks perfectly fine, and have a harder time discovering the problem. -- Cheers, Leif
participants (5)
-
Leif Walsh
-
Robert Kern
-
spir
-
Steven D'Aprano
-
zooko