Re: [Mailman-Developers] Requirements for a new archiver

On Thu, 30 Oct 2003 04:08:45 +0100 Brad Knowles <brad.knowles@skynet.be> wrote:
At 9:22 PM -0500 2003/10/29, J C Lawrence wrote:
cycbufs implement a filesystem-based heap with pool semantics. (There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it.
So long as the calls to malloc() are kept reasonably small (which is typically true in this case), it shouldn't matter whether or not there are any free() calls.
I've written several heap managers including several pool based systems as well as other sorts of custom allocators. There are a great many simplifications that come along with the write-once approach, especially in terms of the trade-offs between allocation expense and free space management.
Yes, you slowly build up more disk space in utilization, but all archive solutions will have the same problem, and this solution will scale as well as, or better than, any other that I know of.
Which is not exactly my point. cycbufs are a useful technique to be sure, much as Chuq has discussed from a management perspective. My point is more that I don't see that they add anything essentially different to the storage space in terms of storage semantics. You get a higher rate of file handle re-use, a more friendly filesystem behaviour for older filesystem designs (pleasant optimisations), but exactly the same single key -> byte stream without adding any more interesting verbs of transforms to the solution space.
This is not a Bad Thing, just not something that seems applicable at this state in the design discussion. First come ontology and semantics, then comes implementation.
Consider the case where you are trying to store all news articles that have ever been posted -- not really much difference.
Actually the two cases are considerably different. In the delete case I have to do pool management, with some eye toward fragmentation control and optimisations of average latency for free heap searches, as well as heap integrity audits. In the write-only case I just build on the end and need pay no mind to prior data once it is allocated. In both cases I have to do predictive work on the distribution of allocation sizes, but that's far cheaper in the write-only case as the multiple-pool search overhead can be entirely skipped. There's a considerable difference in complexity between the two.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw@kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.

At 10:27 PM -0500 2003/10/29, J C Lawrence wrote:
Actually the two cases are considerably different. In the delete case I have to do pool management, with some eye toward fragmentation control and optimisations of average latency for free heap searches, as well as heap integrity audits. In the write-only case I just build on the end and need pay no mind to prior data once it is allocated.
Not really. You still have to maintain all the indexes, make
sure that if things get moved around that all the links get updated, etc.... True, you don't have to worry about fragementation control or other more complex aspects of heap management, but that's a further cost savings over other techniques and not a "drawback" to using this technique for this purpose.
Now, if you want to consider what would happen to you if the
Scientologists ever came after you, or if you had court orders to remove postings that linked to bomb-making instructions, you'd probably want to keep all those other tools related to heap management around anyway. They'd be less likely to be used, but at least you wouldn't have to take the entire site down while you went and wrote the tools from scratch to handle a situation that you had not foreseen.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
participants (2)
-
Brad Knowles
-
J C Lawrence