Re: [Mailman-Developers] Requirements for a new archiver

On Wed, 29 Oct 2003 16:40:53 -0800 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
by the way, this statement is in conflict with my previous statemenet of "use cycbufs". I'm fully aware of that conflict, too. resolving it will be one of the big challenges.
cycbufs implement a filesystem-based heap with pool semantics. (There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it.
Does that model really apply to list archives? It doesn't for me. I may be unusual in this regard, but I generally consider list archives as one-way systems: messages go in and never come out.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw@kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.

On Oct 29, 2003, at 6:22 PM, J C Lawrence wrote:
cycbufs implement a filesystem-based heap with pool semantics.
(There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it.Does that model really apply to list archives? It doesn't for me. I may be unusual in this regard, but I generally consider list archives as one-way systems: messages go in and never come out.
and in general, you're mostly right. Deletions out of archives are pretty minimal. But I think cycbufs still make a lot of sense as a way to reduce design complexity needed to avoid using up potentially infinite numbers of inodes, and the performance and design complexity inherent in building a storage structure around a typical unix filesystem.
It's just so much less hassle on any number of levels dealing with 50 100 megabyte files than it is a directory structure with 500 megabytes of messages spread around 100,000 individual files. whether it's backups and restores, migrating data to a new server, etc, etc etc, you make life much simpler. And god help you if you're updating that structure when the system crashes and you have to fsck and put it back together again.

On Wed, 2003-10-29 at 22:06, Chuq Von Rospach wrote:
It's just so much less hassle on any number of levels dealing with 50 100 megabyte files than it is a directory structure with 500 megabytes of messages spread around 100,000 individual files. whether it's backups and restores, migrating data to a new server, etc, etc etc, you make life much simpler. And god help you if you're updating that structure when the system crashes and you have to fsck and put it back together again.
We should just throw everything into a ZODB FileStorage Data.fs file, and let it grow to gigs in size <1/2 wink>.
-Barry

On Oct 29, 2003, at 8:27 PM, Barry Warsaw wrote:
We should just throw everything into a ZODB FileStorage Data.fs file, and let it grow to gigs in size <1/2 wink>.
<troll> until you have to split it across two disks because one is full.
and don't forget, a single monolithic storage file gets backed up fully every time you change it. The guy in charge of buying tapes to back up your system just screamed in agony, since there's no possibility of an incremental backup for what is 99.9999999% static data.
</troll>

On Wed, 2003-10-29 at 23:37, Chuq Von Rospach wrote:
<troll> until you have to split it across two disks because one is full.
and don't forget, a single monolithic storage file gets backed up fully every time you change it. The guy in charge of buying tapes to back up your system just screamed in agony, since there's no possibility of an incremental backup for what is 99.9999999% static data.
</troll>
Actually, newer versions of ZODB have a script called repozo.py which makes incremental backups feasible. It knows a lot about FileStorage's formats. Also note that there are alternative storage implementations such as BerkeleyDB-based storage (slow, but presumably more reliable) and the 3rd party DirectoryStorage.
We'll talk about databases in another thread. I have my own biases, but I'm too tired now to get into it.
-Barry

At 9:22 PM -0500 2003/10/29, J C Lawrence wrote:
cycbufs implement a filesystem-based heap with pool semantics. (There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it.
So long as the calls to malloc() are kept reasonably small (which
is typically true in this case), it shouldn't matter whether or not there are any free() calls. Yes, you slowly build up more disk space in utilization, but all archive solutions will have the same problem, and this solution will scale as well as, or better than, any other that I know of.
Consider the case where you are trying to store all news articles
that have ever been posted -- not really much difference.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

"claw" == J C Lawrence "Re: [Mailman-Developers] Requirements for a new archiver " Wed, 29 Oct 2003 21:22:32 -0500
claw> I may be unusual in this regard, but I generally consider
claw> list archives as one-way systems: messages go in and never
claw> come out.
Out of idle curiosity, why doesn't 'write once read many' indicate a directory more than a database?
jam
participants (5)
-
Barry Warsaw
-
Brad Knowles
-
Chuq Von Rospach
-
J C Lawrence
-
John A. Martin