Q: Fast searching Imap server, IMAP server (in Python)

Jan Kybic kybic at ieee.org
Wed Nov 7 13:44:44 CET 2001


> >>         I put all my mail (now about 30MB) into a local IMAP server,
> >> currently Courier-IMAP. I frequently search for old mails with
> >> commands such as 'SEARCH SINCE 1-Oct-2001 FROM jan TO jurgen' but they
> >> seem to take eternity, I think the server just goes through all the
> >> messages one by one.
> > 
> > That's the price you pay for the maildir format used by Courier.
> > 
> >> Would you know about an IMAP server which stores important information
> >> about messages (data, from, to, subject) in some sort of a database so
> >> that the searches are quicker?
> > 
> > Cyrus.

Are you sure Cyrus does it? I did not find anything about it in the
documentation. My first attempt to compile it failed, but I am willing
to try again if it solves my problem.


> > Or use UW, which uses flat files that are much faster to search.
> Courier beats UW when searching large mailboxes, on mid-sized hardware:

I do not think it matters, both are going to get uncomfortably slow
when the number of messages grows. Currently, the search of about 30MB
in about 3000 mail messages takes sometimes several minutes on a Pentium II
computer, which is just too slow for me.

I am convinced the only solution is to store the important headers,
possibly with the messages itself, in some database. This has been
implemented with Python and MySQL or Postgress.  There is (mail2db,
ftp://ftp.tummy.com/pub/tummy/Mail2DB/) and think the project is
called something like PySQLmail or SQLmail (to check) but there is no
IMAP server interface. I am willing to try to do it myself but I would
prefer to use an existing solution, if available.

> As far as caching mail headers go, again, a real life investigation would
> uncover some interesting surprises: such as the fact that the majority of
> IMAP mail clients already cache headers.  Name them:  Outlook/Outlook
> Express, Netscape/Mozilla.  All of them cache headers.  And that takes care

I agree that cacheing headers of the mails read in the same session is
a good thing. However, I do not think the mail client should cache all
the headers of all the mail you ever received, which is what I would
need when searching for some old mail - there are just way too
many. As far as I understood the IMAP philosphy, this should be the
job of the server. 

I agree that you can (and should) divide your mail into folders. But
with the time, you will either have too many folders or too many
messages in each of them anyway. Moreover, I sometimes want to search
on subject, sometimes on a person, sometimes on a date - this
flexibility cannot be offered by folders only.


> The only potential savings (as opposed to technically sophisticated
> algorithms that were written only to provide material for a dissertation
> paper, but serve no useful real-world purpose otherwise) from caching mail
> headers can come about if the server has sufficient intelligence to notice
> which mail headers the IMAP client usually asks for, and automatically
> cache those headers, in advance, on all newly arrived mail, anticipating
> that the mail client will request them in the future.

I do not require any artificial intelligence. The set of headers to
search on can be determined in advance, for example: from, to, cc,
bcc, date, subject. I am willing to accept the slow search in the
unlikely event of wanting to search on other criteria.

Jan

-- 
-------------------------------------------------------------------------
Jan Kybic <kybic at ieee.org>      Robotvis, INRIA, Sophia-Antipolis, France
       or <Jan.Kybic at sophia.inria.fr>,tel. work +33 492 38 7589, fax 7845
                    http://www-sop.inria.fr/robotvis/personnel/Jan.Kybic/



More information about the Python-list mailing list