[Mailman-Developers] Want to Code... need some feedback
J C Lawrence
claw@2wire.com
Thu, 05 Jul 2001 14:32:47 -0700
On Thu, 05 Jul 2001 14:16:53 -0700
Charles Iliya Krempeaux <tnt@linux.ca> wrote:
> Hello, J C Lawrence <claw@kanga.nu> wrote:
>>> To do this, I think that e-mail messages should be dumped into a
>>> database. (Since I have MySQL and PostgreSQL at my disposal,
>>> those are what I'll be able to support myself.)
>> I have some early proof of concepts done on having MHonArc
>> generate scripts which when executed insert their respective
>> message contents into PostgresQL with the appropriate threading
>> links. The code is based off the PHP and templated based
>> archiving I already do at Kanga.Nu, merely taking the already
>> products PHP variable assignments in the current system and
>> insteaf having the back end use them as the values to insert into
>> the DB.
>>
>> It works. Kinda. Its not pretty. The reliance on PHP as an
>> intermediate layer should be removed (slightly messy as MHonArc
>> insists on inserting HTML-style comments), Proper thread handling
>> and generation needs to be improved (Shouldn't reluy on MHonArc
>> but should be dynamically generated). etc.
> My way of thinking, of having it designed, is that Mailman (using
> Python) directly dumps the e-mail messages into the database.
<nod>
> (Are there standard [or defacto standard] Python modules for
> accessing databases?... For accessing MySQL and PostgreSQL?)
Yes.
> Then, standard PHP (and whatever other languages)
> bindings/libraries, to the database, can be provided. That way,
> the database is the middle man. And Mailman, and the PHP
> binding/library (and any other language binding/library) only
> depend on the database. (And better still, Mailman is completely
> independent of the PHP [and vice versa]. Only the database
> structure matters.)
The reasons I don't want to do this:
1) MIME
2) national (and other) character sets
3) content types (really a subset of MIME but a large enough
problem to be unique)
> (To get a little deeper into the design...) the important things I
> see, to extract from each message (and also store), is:
Minimally:
To: (multiple)
From:
CC:
To: GECOS (multiple)
From: GECOS
CC: GECOS (multiple)
Date:
Receipt date
Receipt address(es) (multiple)
MessageID
References: (multiple)
In-Reply-To: (computed it missing, flagged)
Subject:
Prior subject (was: (...) matching, opportunistic history match)
Message Body
MIME Key (if any)
MIME structure
Indexes to external MIME items
--
J C Lawrence claw@kanga.nu
---------(*) http://www.kanga.nu/~claw/
The pressure to survive and rhetoric may make strange bedfellows