[Mailman-Developers] Requirements for a new archiver

J C Lawrence claw at kanga.nu
Wed Oct 29 23:43:55 EST 2003


On Thu, 30 Oct 2003 05:15:58 +0100 
Brad Knowles <brad.knowles at skynet.be> wrote:
> At 11:01 PM -0500 2003/10/29, J C Lawrence wrote:

>> With a write-once system you don't actually need to ever move
>> anything.

> Depends on how you manage the storage of those large files.  If you
> have an infinitely large filesystem that is guaranteed 100% reliable
> in all possible circumstances, you're right.  Otherwise, you might
> find that the filesystem is getting full and things need to be moved
> around, or you suffer a disk or storage system crash and you have to
> restore from backups, or you use an HSM solution to move older files
> to slower/higher capacity storage, or you have issues with too many
> large files in a single directory and need to implement your own
> directory hashing scheme, etc....

True, but most of those really end up being a meta-indexing problem.
You have many big files.  You have indexes which point into those many
big files.  Occasionally you move those big files about, so your
meta-indexes need to be changed point to the new locations of the big
files, but the same offsets within the big files...

Its really not an expensive or difficult space.

If you really need to move individual messages about between file blobs
at a respectable rate, then you're in another world of pain, but we
don't have any evidence of that requirement, or that such a requirement
can't be handled by simply unrolling the big file and respooling the
individual messages onto the ends of other big files in different
locations.

>> Not really.  The percentage of such deleted posts over the lifetime
>> of the store can be generally assumed to be less than 1 in 10^5, and
>> is probably considerably lower, if not in the 1:10^8 range.  Add a
>> simple invalid key semantic and you're done.

> It depends on whether or not the court order allows you to just mark
> things as "deleted" and be done with it.  If they force you to
> actually expunge all copies of that data from your systems, you will
> have to do more work.

Ahem.

  for key in list_of_bad_message_keys:
    big_file, offset, length = get_message_big_file (key)
    handle = open (big_file)
    handle.seek (offset)
    handle.write (' ', length)
    handle.close ()
    key.invalidate ()

Not a whole lot more complexity.  You're just invalidating the
pointed-to data as well as the key.  You're still not doing free space
management.

-- 
J C Lawrence                
---------(*)                Satan, oscillate my metallic sonatas. 
claw at kanga.nu               He lived as a devil, eh?		  
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.



More information about the Mailman-Developers mailing list