Quoth Charlie Clark <charlie@begeistert.org>: ... | wow, that is a lot of traffic and a sluggish system. But I don't see how | 6500 mails can generate a 68 MB unless there are a lot of attachments.
Even with attachments it takes some doing. The articles database actually tends to be double the size of the .txt archive for the same month. Not only this particular list.
| As someone else has noted apart from the performance issue there is also | one of usability: searching 6500 messages is a proverbial needle in a | haystack. I don't know what the Next Generation archiver is but I guess | it's a move towards a more sophisticated persistance system which might | allow things like full-text searches. If I understand you correctly you | want to archive mails directly as they come in and keep the archiver / | db-connection open pretty much all the time. This would seem about the best | solution in the short term. Anything else sounds like: database to pass the | memory issue to something which is designed to handle it. Our own | experience with lists with a large member base is that Mailman isn't that | efficient at dealing with them which is why we're using an RDBMS adapter | and letting the RDBMS be the muscle while Mailman remains the brains. | | Novice question? How easy would it be just to dump to a database rather | than using Python storage?
I don't know if anyone has a clear picture of this next generation system, it's just the subject of a dormant mailing list http://rogue.amk.ca/mailman/listinfo/ng-arch
An external database might be a part of that picture, for all I know. I would expect that if a database really would pay off, it would take some rewriting anyway. It would be trivial to use a database for storage, but that would only slow things down. Really need to find a way to use the database more inside the threading algorithm.
Donn Cave, donn@u.washington.edu
On 2003-05-12 at 19:20:41 [+0200], you wrote:
Even with attachments it takes some doing. The articles database actually tends to be double the size of the .txt archive for the same month. Not only this particular list.
| Novice question? How easy would it be just to dump to a database rather | than using Python storage?
I don't know if anyone has a clear picture of this next generation system, it's just the subject of a dormant mailing list http://rogue.amk.ca/mailman/listinfo/ng-arch
mm, I've just read this and there is some interesting stuff. I'm not much of a programmer but I always miss references to "use" in requirements descriptions. In an internet world we can do away with zipped versions of the archive at least in "monthly" or similar time-based versions. It makes much more sense to be able to download a whole thread or topic independent of which time barriers it crosses. The comparison with NNTP is good as it would be good to learn from the way Newsnet does it: e-mail and news aren't really that different as a friend of mine insists on reminding me. On his Amiga they've always been same...
An external database might be a part of that picture, for all I know. I would expect that if a database really would pay off, it would take some rewriting anyway. It would be trivial to use a database for storage, but that would only slow things down. Really need to find a way to use the database more inside the threading algorithm.
mm, I don't like the file system storage suggestions. It would be alright if more people used BFS so we could use fsquery: no problems with lots of files and ATTRIBUTES :-)! but we're in a minority. The Bethon list is up by the way: bethon-request@mail.nexon.de
But the main reason for a database would be scalability and resilience which seem absolutely essential for your current situation. Performance bottle necks can be alleviated through continuous archiving as opposed to cron-based CPU-killers. Put some kind of search system on top of it for full-text searching. I think it is very important to define the APIs so that storage type independance is in from the beginning. When I think of things, I think XML should be the way of structuring the mails but then I think "performance" so maybe not XML. But maybe it could help with modelling things and defining the interface?
Charlie
participants (2)
-
Charlie Clark
-
Donn Cave