On 2003-05-12 at 19:20:41 [+0200], you wrote:
Even with attachments it takes some doing. The articles database actually tends to be double the size of the .txt archive for the same month. Not only this particular list.
| Novice question? How easy would it be just to dump to a database rather | than using Python storage?
I don't know if anyone has a clear picture of this next generation system, it's just the subject of a dormant mailing list http://rogue.amk.ca/mailman/listinfo/ng-arch
mm, I've just read this and there is some interesting stuff. I'm not much of a programmer but I always miss references to "use" in requirements descriptions. In an internet world we can do away with zipped versions of the archive at least in "monthly" or similar time-based versions. It makes much more sense to be able to download a whole thread or topic independent of which time barriers it crosses. The comparison with NNTP is good as it would be good to learn from the way Newsnet does it: e-mail and news aren't really that different as a friend of mine insists on reminding me. On his Amiga they've always been same...
An external database might be a part of that picture, for all I know. I would expect that if a database really would pay off, it would take some rewriting anyway. It would be trivial to use a database for storage, but that would only slow things down. Really need to find a way to use the database more inside the threading algorithm.
mm, I don't like the file system storage suggestions. It would be alright if more people used BFS so we could use fsquery: no problems with lots of files and ATTRIBUTES :-)! but we're in a minority. The Bethon list is up by the way: bethon-request@mail.nexon.de
But the main reason for a database would be scalability and resilience which seem absolutely essential for your current situation. Performance bottle necks can be alleviated through continuous archiving as opposed to cron-based CPU-killers. Put some kind of search system on top of it for full-text searching. I think it is very important to define the APIs so that storage type independance is in from the beginning. When I think of things, I think XML should be the way of structuring the mails but then I think "performance" so maybe not XML. But maybe it could help with modelling things and defining the interface?
Charlie