On Wed, 2003-10-29 at 14:38, Chuq Von Rospach wrote:
Hint: look at what INN did when they implmented cycbufs.
Effectively, you create 1-N files, or create files as needed. Each file is N bytes long, pre-allocated on file creation. When you store messages, they're written into the file sequentially (or any other way you want. If you want to get into best fit allocations and turn this into a malloc() style heap, be my guest).
Metadata to access the info is then a filename, and an lseek() pointer into the file, and # of bytes to read, plus your normal identifying info. It's fast, it's efficient use of file pointers, it avoids the worst aspects of the unix file system, and I'm amazed nobody ever thinks to use it for other purposes (or that it took that long for usenet people to discover it, I suggested a simpler variant of it back in the 80s and was told inodes are our friends...)
I'm not sure if Andrew Koenig is on this list, but he described an algorithm he developed to quickly find messages in an mbox file. If he's here, maybe he can talk about it.
From lines in the body of the message. MMDF would be better, but I
I really don't like mbox files, primarily because they require munging think ideal from a philosophical point of view would be one-message-per-file if it can be done efficiently cross-platform. Maybe file system experts here can provide pointers or advice on exactly which file and operating systems make this approach feasible, even for huge message counts.
you can even do expiration/purge/etc if you want, by moving stuff around and changing the pointers.
I've even thought of using it as the backing store for a picture library. With a nice relational database and a series of these "data boxes", I think you have store data in the best and fastest possible way...
It's a very interesting idea.
-Barry