transaction-like file operations

holger krekel pyth at devel.trillke.net
Thu Aug 8 12:05:44 EDT 2002


Gerson Kurz wrote:
> I am working on a python program that will run on an embedded
> ppc-linux system. If, during a file write operation (to flash memory),
> the machine is powered off, the file gets corrupted, and is lost on
> the next reboot. (I cannot prevent the user from powering off the
> machine - there is no display, nor any indication if the system is
> right now writing to disk or not. Its an embedded system, after all).
> 
> So, what I need is a transaction-like file operation, that allows me
> to either write the file completely, or keep the old file (so that at
> every point there is at least one set of data available).
> 
> I have made a "homegrown" solution that includes writing to a backup
> file first, then doing two rename operations. I wonder if there exists
> a standard class for transaction-like file operations in python?
> [Note: a database is not an option]

You might like to check with reiserfs at it has database-like functionalities
and some extra guaranties over POSIX.  

I think you can realize the wanted behaviour with *standard python*
not requiring any special modules.  Python maps some important POSIX 
system calls into the 'os' module. Read the man-pages of 

    rename and fsync 

very carefully. I think you could *roughly* do:

import os

def update_file(updater, path):
    tmppath = path + '.vfs_transaction'
    newdir = open(os.path.dirname(path))
    newfile = open(tmppath,'w')
    updater(newfile)
    os.fsync(newfile.fileno())
    os.rename(tmppath, path)   # posix guarantees atomicity!
    os.fsync(newdir.fileno())  # persists updates of meta-data?!

class Writer:
    count = 0
    def __call__(self, file):
        file.write(str(self.count) * 10000)
        self.count += 1

writer = Writer()
update_file(writer, '/tmp/txtest.test')

   
the critical part is between the last two commands of 'update_file'. 
The rename is atomic but not guaranteed to be 'persistent' at once. 
The fsync on (the meta-data of) newdir should help but i don't
know this for sure.  This probably depends on the filesystem 
implementation and harddisk-caching.  You could ask people on 
the reiserfs-list (and report back, please :-).

Anyway, i recommend to dedicate a machine for some testing. 
Run 10 processes looping with the above 'transactions' 
and turn power off after some minutes. See if everything is as 
consistent as you expect it.  Best to do it with the target system :-)

Of course, there are some issues which need further thought
and discussion...

have fun,

    holger




More information about the Python-list mailing list