![](https://secure.gravatar.com/avatar/a148cdd5c639fe49576e590c26f615ef.jpg?s=120&d=mm&r=g)
At 10:47 PM -0500 2003/10/29, Barry Warsaw wrote:
I'm not sure if Andrew Koenig is on this list, but he described an algorithm he developed to quickly find messages in an mbox file. If he's here, maybe he can talk about it.
7th edition mbox files are a pain. There are other mailbox file
formats that are much better and easier to parse (UW-IMAP .mbx being one).
I really don't like mbox files, primarily because they require munging From lines in the body of the message. MMDF would be better, but I think ideal from a philosophical point of view would be one-message-per-file if it can be done efficiently cross-platform.
Therein lies the problem. Some filesystems make this more
feasible than others, at least on larger scale systems.
Maybe file system experts here can provide pointers or advice on exactly which file and operating systems make this approach feasible, even for huge message counts.
SGIs XFS on Irix does a pretty good job, with hashed directory
structures, and an extent-based journaling filesystem. Regretfully, I don't think that all of these features are fully supported under the Linux version of XFS, and that work has basically ground to a halt with the lay-offs of all the key SGI people who had been working on XFS. Veritas VxFS also does a good job in this area.
Other than SGI XFS for Irix and Veritas VxFS, I don't know of any
good solutions to this problem at the filesystem level.
Kirk McKusick and Eric Allman agree with you that this is a
proper filesystem problem that should be solved at the filesystem level (at least, that's what they've said to me when I brought this issue up to them), and they feel you should not attempt to solve filesystem problems with "tricks" like INN timecaf/timehash cycbufs.
However, while that's nice in theory, that doesn't necessarily
help us here in the real world.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)