[Mailman-Developers] message data storage

Tue May 13 14:12:37 EDT 2003

On 2003-05-12 at 19:20:41 [+0200], you wrote:
> Even with attachments it takes some doing.  The articles database 
> actually tends to be double the size of the .txt archive for the same 
> month.  Not only this particular list.

> | Novice question? How easy would it be just to dump to a database rather 
> | than using Python storage?
> 
> I don't know if anyone has a clear picture of this next generation 
> system, it's just the subject of a dormant mailing list 
> http://rogue.amk.ca/mailman/listinfo/ng-arch

mm, I've just read this and there is some interesting stuff. I'm not much 
of a programmer but I always miss references to "use" in requirements 
descriptions. In an internet world we can do away with zipped versions of 
the archive at least in "monthly" or similar time-based versions. It makes 
much more sense to be able to download a whole thread or topic independent 
of which time barriers it crosses. The comparison with NNTP is good as it 
would be good to learn from the way Newsnet does it: e-mail and news aren't 
really that different as a friend of mine insists on reminding me. On his 
Amiga they've always been same...

> An external database might be a part of that picture, for all I know. I 
> would expect that if a database really would pay off, it would take some 
> rewriting anyway.  It would be trivial to use a database for storage, but 
> that would only slow things down.  Really need to find a way to use the 
> database more inside the threading algorithm.

mm, I don't like the file system storage suggestions. It would be alright 
if more people used BFS so we could use fsquery: no problems with lots of 
files and ATTRIBUTES :-)! but we're in a minority. The Bethon list is up by 
the way: bethon-request at mail.nexon.de

But the main reason for a database would be scalability and resilience 
which seem absolutely essential for your current situation. Performance 
bottle necks can be alleviated through continuous archiving as opposed to 
cron-based CPU-killers. Put some kind of search system on top of it for 
full-text searching. I think it is very important to define the APIs so 
that storage type independance is in from the beginning. When I think of 
things, I think XML should be the way of structuring the mails but then I 
think "performance" so maybe not XML. But maybe it could help with 
modelling things and defining the interface?

Charlie