Re: [Mailman-Developers] Requirements for a new archiver

On Wed, 29 Oct 2003 11:38:33 -0800 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
Hint: look at what INN did when they implmented cycbufs.
Aye, its a cute system.
Effectively, you create 1-N files, or create files as needed. Each file is N bytes long, pre-allocated on file creation. When you store messages, they're written into the file sequentially (or any other way you want. If you want to get into best fit allocations and turn this into a malloc() style heap, be my guest).
Metadata to access the info is then a filename, and an lseek() pointer into the file, and # of bytes to read, plus your normal identifying info. It's fast, it's efficient use of file pointers, it avoids the worst aspects of the unix file system, and I'm amazed nobody ever thinks to use it for other purposes (or that it took that long for usenet people to discover it, I suggested a simpler variant of it back in the 80s and was told inodes are our friends...)
Small caveat: Some modern fileystems make operating on the one-file-per-message stores extremely efficient. Admittedly they aren't in wide cross-platform deployment, but the filesystems and file op behaviour of today and yesteryear are not quite the same.
I've even thought of using it as the backing store for a picture library. With a nice relational database and a series of these "data boxes", I think you have store data in the best and fastest possible way...
Some years back I talked to Mike Belshe (used to be at Remarq) about their storage techniques (I caught him shortly after Critical Path bought Remarq). Keying off other LISA papers they segmented their storage space by object size, customising and configuring each segment to suit (things like RAID strip size, number of spindles, FS tuning parameters, etc). He asserted that the rewards were very significant.
However, these are very large archive problems and are a bit outside of Mailman's home turf.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw@kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.

Howdy,
A few comments from the peanut gallery:
grep, perl, awk, vim, emacs, cp, mv, tar, etc. don't work too well on SQL databases, but are nice for admins who want quick and dirty searching, move mailman to a new machine, or need to poke around.
third-party add-ons make it that much harder to install. If I have to set up a Mysql or Postgres database to use Mailman, it's a step that will put off people who don't already have it going.
Cheers,
David.
On Wed, 29 Oct 2003, J C Lawrence wrote:
On Wed, 29 Oct 2003 11:38:33 -0800 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
Hint: look at what INN did when they implmented cycbufs.
Aye, its a cute system.
Effectively, you create 1-N files, or create files as needed. Each file is N bytes long, pre-allocated on file creation. When you store messages, they're written into the file sequentially (or any other way you want. If you want to get into best fit allocations and turn this into a malloc() style heap, be my guest).
Metadata to access the info is then a filename, and an lseek() pointer into the file, and # of bytes to read, plus your normal identifying info. It's fast, it's efficient use of file pointers, it avoids the worst aspects of the unix file system, and I'm amazed nobody ever thinks to use it for other purposes (or that it took that long for usenet people to discover it, I suggested a simpler variant of it back in the 80s and was told inodes are our friends...)
Small caveat: Some modern fileystems make operating on the one-file-per-message stores extremely efficient. Admittedly they aren't in wide cross-platform deployment, but the filesystems and file op behaviour of today and yesteryear are not quite the same.
I've even thought of using it as the backing store for a picture library. With a nice relational database and a series of these "data boxes", I think you have store data in the best and fastest possible way...
Some years back I talked to Mike Belshe (used to be at Remarq) about their storage techniques (I caught him shortly after Critical Path bought Remarq). Keying off other LISA papers they segmented their storage space by object size, customising and configuring each segment to suit (things like RAID strip size, number of spindles, FS tuning parameters, etc). He asserted that the rewards were very significant.
However, these are very large archive problems and are a bit outside of Mailman's home turf.
-- J C Lawrence ---------(*) Satan, oscillate my metallic sonatas. claw@kanga.nu He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers

On Oct 29, 2003, at 1:05 PM, David Birnbaum wrote:
- grep, perl, awk, vim, emacs, cp, mv, tar, etc. don't work too well on SQL databases, but are nice for admins who want quick and dirty searching, move mailman to a new machine, or need to poke around.
and given how many admins never see anything but the web site, that's a nice thing, but far from an important one.
- third-party add-ons make it that much harder to install. If I have to set up a Mysql or Postgres database to use Mailman, it's a step that will put off people who don't already have it going.
actually, if you do it right, it's much easier -- because when you build in those tools, you build in standardized interfaces that third party add-ons can access, instead of the current case, which are code hacks that break every time Barry burps at the CVS server...

On Wed, 2003-10-29 at 15:41, J C Lawrence wrote:
Some years back I talked to Mike Belshe (used to be at Remarq) about their storage techniques (I caught him shortly after Critical Path bought Remarq). Keying off other LISA papers they segmented their storage space by object size, customising and configuring each segment to suit (things like RAID strip size, number of spindles, FS tuning parameters, etc). He asserted that the rewards were very significant.
However, these are very large archive problems and are a bit outside of Mailman's home turf.
Mailman's philosophy is, keep it as simple as possible to handle 80% of the installations out there, but provide enough framework for the other 20% to extend for extreme uses. Strategies to accomplish this include defining interfaces to key components, and shipping something that works out of the box and is good enough for most people.
It's not always easy, of course, to architect something that scales this way. I think we have a pretty good idea of the scaling problems with Mailman 2, and I hope we can push the envelop significantly for Mailman 3.
-Barry
participants (4)
-
Barry Warsaw
-
Chuq Von Rospach
-
David Birnbaum
-
J C Lawrence