[Mailman-Developers] Requirements for a new archiver

Wed Oct 29 22:27:50 EST 2003

On Thu, 30 Oct 2003 04:08:45 +0100 
Brad Knowles <brad.knowles at skynet.be> wrote:
> At 9:22 PM -0500 2003/10/29, J C Lawrence wrote:

>> cycbufs implement a filesystem-based heap with pool semantics.
>> (There's a fair bit of literature on that space in the OS and
>> application realm) As such they are specifically tuned for the case
>> where the number of calls to malloc() are of a similar magnitude to
>> the calls to free().  This makes sense in a netnews world where news
>> articles expire regularly, and in general as much data is added to
>> the spool as is removed from it.

> So long as the calls to malloc() are kept reasonably small (which is
> typically true in this case), it shouldn't matter whether or not there
> are any free() calls.  

I've written several heap managers including several pool based systems
as well as other sorts of custom allocators.  There are a great many
simplifications that come along with the write-once approach, especially
in terms of the trade-offs between allocation expense and free space
management.

> Yes, you slowly build up more disk space in utilization, but all
> archive solutions will have the same problem, and this solution will
> scale as well as, or better than, any other that I know of.

Which is not exactly my point.  cycbufs are a useful technique to be
sure, much as Chuq has discussed from a management perspective.  My
point is more that I don't see that they add anything essentially
different to the storage space in terms of storage semantics.  You get a
higher rate of file handle re-use, a more friendly filesystem behaviour
for older filesystem designs (pleasant optimisations), but exactly the
same single key -> byte stream without adding any more interesting verbs
of transforms to the solution space.

This is not a Bad Thing, just not something that seems applicable at
this state in the design discussion.  First come ontology and semantics,
then comes implementation.

> Consider the case where you are trying to store all news articles that
> have ever been posted -- not really much difference.

Actually the two cases are considerably different.  In the delete case I
have to do pool management, with some eye toward fragmentation control
and optimisations of average latency for free heap searches, as well as
heap integrity audits.  In the write-only case I just build on the end
and need pay no mind to prior data once it is allocated.  In both cases
I have to do predictive work on the distribution of allocation sizes,
but that's far cheaper in the write-only case as the multiple-pool
search overhead can be entirely skipped.  There's a considerable
difference in complexity between the two.

-- 
J C Lawrence                
---------(*)                Satan, oscillate my metallic sonatas. 
claw at kanga.nu               He lived as a devil, eh?		  
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.