Re: [Mailman-Developers] Requirements for a new archiver
![](https://secure.gravatar.com/avatar/db32238d5eebf878622c8bd2770a7d0e.jpg?s=120&d=mm&r=g)
On Thu, 30 Oct 2003 05:15:58 +0100 Brad Knowles <brad.knowles@skynet.be> wrote:
At 11:01 PM -0500 2003/10/29, J C Lawrence wrote:
With a write-once system you don't actually need to ever move anything.
Depends on how you manage the storage of those large files. If you have an infinitely large filesystem that is guaranteed 100% reliable in all possible circumstances, you're right. Otherwise, you might find that the filesystem is getting full and things need to be moved around, or you suffer a disk or storage system crash and you have to restore from backups, or you use an HSM solution to move older files to slower/higher capacity storage, or you have issues with too many large files in a single directory and need to implement your own directory hashing scheme, etc....
True, but most of those really end up being a meta-indexing problem. You have many big files. You have indexes which point into those many big files. Occasionally you move those big files about, so your meta-indexes need to be changed point to the new locations of the big files, but the same offsets within the big files...
Its really not an expensive or difficult space.
If you really need to move individual messages about between file blobs at a respectable rate, then you're in another world of pain, but we don't have any evidence of that requirement, or that such a requirement can't be handled by simply unrolling the big file and respooling the individual messages onto the ends of other big files in different locations.
Not really. The percentage of such deleted posts over the lifetime of the store can be generally assumed to be less than 1 in 10^5, and is probably considerably lower, if not in the 1:10^8 range. Add a simple invalid key semantic and you're done.
It depends on whether or not the court order allows you to just mark things as "deleted" and be done with it. If they force you to actually expunge all copies of that data from your systems, you will have to do more work.
Ahem.
for key in list_of_bad_message_keys: big_file, offset, length = get_message_big_file (key) handle = open (big_file) handle.seek (offset) handle.write (' ', length) handle.close () key.invalidate ()
Not a whole lot more complexity. You're just invalidating the pointed-to data as well as the key. You're still not doing free space management.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw@kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
![](https://secure.gravatar.com/avatar/a148cdd5c639fe49576e590c26f615ef.jpg?s=120&d=mm&r=g)
At 11:43 PM -0500 2003/10/29, J C Lawrence wrote:
True, but most of those really end up being a meta-indexing problem.
Fair enough.
Not a whole lot more complexity. You're just invalidating the pointed-to data as well as the key. You're still not doing free space management.
What about your backups? And your off-site backups? And your
mirror sites around the world? Any other copies of those files that might have been copied off somewhere else?
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
![](https://secure.gravatar.com/avatar/2206e8a0d58563f815a7568ea6675313.jpg?s=120&d=mm&r=g)
engineering details.
On Oct 29, 2003, at 8:59 PM, Brad Knowles wrote:
What about your backups? And your off-site backups? And your mirror sites around the world? Any other copies of those files that might have been copied off somewhere else?
participants (3)
-
Brad Knowles
-
Chuq Von Rospach
-
J C Lawrence