[Moin-devel] More compact backups?

Magnus Lycka magnus at thinkware.se
Sat Feb 15 05:07:09 EST 2003


At 10:49 2003-02-15 +0100, Juergen Hermann wrote:
>As for reducing the space needed by backups, prune them (using find -
>mtime +100 or so). At some point in the future there'll be a moin-maint
>script that will do that for you.

Sure. I ran into a disk quota "wall", so that was obvious.

I just kept the last four versions of all files. I did that
interactively in Python though, so I didn't save the code.
If I remember correctly, modtime reflects the time that the
file was *written*, not when it was moved to the backup
directory with another time. This makes it dangerous to
remove "old" files. If someone messes up a page that has
been unaltered for a long time, the most recent backup will
have an old modtime.

I did something along the lines of (UNTESTED)

save_versions = 4
d = {}
os.chdir(BACKUPDIR)
for fn in os.listdir('.'):
     try:
         page, timestamp = fn.split('.')
         if not d.has_key(page):
             d[page] = []
         d[page].append(long(timestamp ))
     except:
         # This was not a backup file...
         continue
for page in d:
     versions = d[page]
     versions.sort()
     for version in versions[:-save_versions]:
         os.remove("%s.%i" % (page, version))

By the way, I get a lot of "no differences found, saved four
times" messages. Would it be easy not to do the save/backup
routine if the page wasn't changed at all? This problem relates
to the prune issue... If all the version I keep when I prune
are identical to the current version, I might as well sweep
the entire backup drectory...

Without such a fix, the prune script would have to loop through
the versions and remove adjacent identical versions first.

I certainly think that the CVS solution seems like a good idea,
but the gzip alternative seems good as well. With the gzip option,
it should be possible to code in such a way that compression was
transparent to MoinMoin, i.e. that most of the MoinMoin code would
be completely unaware of the state of the backup files, and that
it was possible both to configure MoinMoin to gzip on backup, and
also to write a script that gzipped all backup files without
disturbing MoinMoin. If a file had a .gzip ending it would unzip
it before using it... On the other hand, this seems to require a
centralized file access API which is similar to what a CVS store
would require...

In general, it seems like a good idea to centralize such a thing.
Not only for backup, but also for the fresh pages.


-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus at thinkware.se





More information about the Moin-devel mailing list