[Mailman-Developers] message data storage
Charlie Clark
charlie at begeistert.org
Tue May 13 14:12:37 EDT 2003
On 2003-05-12 at 19:20:41 [+0200], you wrote:
> Even with attachments it takes some doing. The articles database
> actually tends to be double the size of the .txt archive for the same
> month. Not only this particular list.
> | Novice question? How easy would it be just to dump to a database rather
> | than using Python storage?
>
> I don't know if anyone has a clear picture of this next generation
> system, it's just the subject of a dormant mailing list
> http://rogue.amk.ca/mailman/listinfo/ng-arch
mm, I've just read this and there is some interesting stuff. I'm not much
of a programmer but I always miss references to "use" in requirements
descriptions. In an internet world we can do away with zipped versions of
the archive at least in "monthly" or similar time-based versions. It makes
much more sense to be able to download a whole thread or topic independent
of which time barriers it crosses. The comparison with NNTP is good as it
would be good to learn from the way Newsnet does it: e-mail and news aren't
really that different as a friend of mine insists on reminding me. On his
Amiga they've always been same...
> An external database might be a part of that picture, for all I know. I
> would expect that if a database really would pay off, it would take some
> rewriting anyway. It would be trivial to use a database for storage, but
> that would only slow things down. Really need to find a way to use the
> database more inside the threading algorithm.
mm, I don't like the file system storage suggestions. It would be alright
if more people used BFS so we could use fsquery: no problems with lots of
files and ATTRIBUTES :-)! but we're in a minority. The Bethon list is up by
the way: bethon-request at mail.nexon.de
But the main reason for a database would be scalability and resilience
which seem absolutely essential for your current situation. Performance
bottle necks can be alleviated through continuous archiving as opposed to
cron-based CPU-killers. Put some kind of search system on top of it for
full-text searching. I think it is very important to define the APIs so
that storage type independance is in from the beginning. When I think of
things, I think XML should be the way of structuring the mails but then I
think "performance" so maybe not XML. But maybe it could help with
modelling things and defining the interface?
Charlie
More information about the Mailman-Developers
mailing list