Q: Fast searching Imap server, IMAP server (in Python)

Mark Crispin mrc at CAC.Washington.EDU
Wed Nov 7 16:00:54 CET 2001


On 7 Nov 2001, Jan Kybic wrote:
> Are you sure Cyrus does it? I did not find anything about it in the
> documentation. My first attempt to compile it failed, but I am willing
> to try again if it solves my problem.

Yes, Cyrus's mailbox format maintains metadata sufficient to answer your
"SEARCH SINCE 1-Oct-2001 FROM jan TO jurgen" without having to look at the
message files.

Once you do one search like this in UW, subsequent searches of a similar
nature are instantaneous since UW discovered all the necessary metadata in
the first search and now has it cached.  Thus, the search would be
completely in memory without looking at the disk at all.

> I do not think it matters, both are going to get uncomfortably slow
> when the number of messages grows. Currently, the search of about 30MB
> in about 3000 mail messages takes sometimes several minutes on a Pentium II
> computer, which is just too slow for me.

That is too slow for most people!  I've never had a 30MB/3000 message
search take several minutes, not even on a 25MHz 68040 which is much
slower than a Pentium II.

> I am convinced the only solution is to store the important headers,
> possibly with the messages itself, in some database.

Yup.  Cyrus doesn't store messages in the database, but it does store the
metadata there.  Exchange uses a database for both (of course, that
database is MS proprietary).

> I agree that cacheing headers of the mails read in the same session is
> a good thing.

A server is pretty much forced to cache metadata if its going to have any
reasonable performance, especially when multiple sesions are done per
search.

> However, I do not think the mail client should cache all
> the headers of all the mail you ever received, which is what I would
> need when searching for some old mail - there are just way too
> many. As far as I understood the IMAP philosphy, this should be the
> job of the server.

Exactly right!

The idea of "make the client cache" is based upon the POP3 model and an
implicit presumption that users only use a single PC and a PC only has a
single user.  The soaring popularity of web-based mail should have
thoroughly debunked that notion.

There will always be a market for web-based mail; for a significant
portion of the user community it is the right thing.  However, there's
also a significant portion of the user community which uses web-based mail
as a substitute for a decent GUI that doesn't require the client to cache
(Pine has grabbed the user base that doesn't insist upon a GUI).

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.





More information about the Python-list mailing list