Re: [Mailman-Developers] Requirements for a new archiver

Oct. 29, 2003

      On Wed, 29 Oct 2003 16:12:50 -0800
Chuq Von Rospach <chuqui@plaidworks.com> wrote:
...
On Oct 29, 2003, at 2:28 PM, Peter C. Norton wrote:
...
...
I may not have made it clear, but I'm focusing on the metadata.  Once
you've parsed rfc822/2822, then it may become easier to have things
in the database that can manipulate those types.  I.e. to do be able
to do simple searches for a property of given arbitrary headers (w/o
having to have a database schema that consists of a few known headers
and "others" which you then have to treat as a blob or as text).
...
my only real worry is that from what I've seen, 99.99% of the time,
the user is going to want content searches. header stuff is fine, but
of really low priority in the scheme of things (necessary to put
useful things together, meaningless if you can't content/context
search in fulltext).
I see two needs, for significantly different populations.  The first
wants a browsing interface with keyed and indexed by date, thread, and
author.  The second wands full text search with rapid location and
retrieval of matching messages.  Often a single user will move between
the access methods, reading by thread, bouncing over to a search, then
reading all an author has written that match, then searching again, etc.
As such two distinct sets of indexes seem called for: full text and
message meta-data.
...
that's why I'm leaning, blob issues or no, towards full-text storage
in MySQL 4. Because if you can't easily chop up the message body
content and find the messages you want to deal with, elegant storage
of the headers is irrelevant...
True.  However, but this seems to conflate two distinct problems.  If
you're going to do unindexed searches then this makes sense, however
except for minimal cases that's an interesting space.  It scales like
crap and has an even worse feature set.  It is more interesting to split
storage and indexing into distinct solution designs, and to build or
pick something tailored for that smaller problem.  That way you don't do
full text searching, you do full text indexing and then search the
indexes.
...
I think you need that, too. But until you get a reasonable context
search for the message body, designing the rest is silly.
Is searching message bodies really interesting, or is building indexes
of message bodies such that you can later search those indexes the
actually interesting point?
...
And it seems to me there are few better methods than dumping the text
into MySQL and letting it do the work. Compromises, tradeoffs and etc
notwithstanding...
How does MySQL help you in building language-sensitive rapid response
indexes of large text blobs?
--
J C Lawrence

---------(*)                Satan, oscillate my metallic sonatas.
claw@kanga.nu               He lived as a devil, eh?		

http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.