[Mailman-Developers] Requirements for a new archiver
J C Lawrence
claw at kanga.nu
Wed Oct 29 22:27:50 EST 2003
On Thu, 30 Oct 2003 04:08:45 +0100
Brad Knowles <brad.knowles at skynet.be> wrote:
> At 9:22 PM -0500 2003/10/29, J C Lawrence wrote:
>> cycbufs implement a filesystem-based heap with pool semantics.
>> (There's a fair bit of literature on that space in the OS and
>> application realm) As such they are specifically tuned for the case
>> where the number of calls to malloc() are of a similar magnitude to
>> the calls to free(). This makes sense in a netnews world where news
>> articles expire regularly, and in general as much data is added to
>> the spool as is removed from it.
> So long as the calls to malloc() are kept reasonably small (which is
> typically true in this case), it shouldn't matter whether or not there
> are any free() calls.
I've written several heap managers including several pool based systems
as well as other sorts of custom allocators. There are a great many
simplifications that come along with the write-once approach, especially
in terms of the trade-offs between allocation expense and free space
management.
> Yes, you slowly build up more disk space in utilization, but all
> archive solutions will have the same problem, and this solution will
> scale as well as, or better than, any other that I know of.
Which is not exactly my point. cycbufs are a useful technique to be
sure, much as Chuq has discussed from a management perspective. My
point is more that I don't see that they add anything essentially
different to the storage space in terms of storage semantics. You get a
higher rate of file handle re-use, a more friendly filesystem behaviour
for older filesystem designs (pleasant optimisations), but exactly the
same single key -> byte stream without adding any more interesting verbs
of transforms to the solution space.
This is not a Bad Thing, just not something that seems applicable at
this state in the design discussion. First come ontology and semantics,
then comes implementation.
> Consider the case where you are trying to store all news articles that
> have ever been posted -- not really much difference.
Actually the two cases are considerably different. In the delete case I
have to do pool management, with some eye toward fragmentation control
and optimisations of average latency for free heap searches, as well as
heap integrity audits. In the write-only case I just build on the end
and need pay no mind to prior data once it is allocated. In both cases
I have to do predictive work on the distribution of allocation sizes,
but that's far cheaper in the write-only case as the multiple-pool
search overhead can be entirely skipped. There's a considerable
difference in complexity between the two.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw at kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
More information about the Mailman-Developers
mailing list