[Mailman-Developers] Want to Code... need some feedback

Charles Iliya Krempeaux tnt@linux.ca
Thu, 05 Jul 2001 14:16:53 -0700


Hello,

J C Lawrence <claw@kanga.nu> wrote:

>>      To do this, I think that e-mail messages should be dumped
>> into a database.  (Since I have MySQL and PostgreSQL at my
>> disposal, those are what I'll be able to support myself.)
> 
> I have some early proof of concepts done on having MHonArc generate
> scripts which when executed insert their respective message contents
> into PostgresQL with the appropriate threading links.  The code is
> based off the PHP and templated based archiving I already do at
> Kanga.Nu, merely taking the already products PHP variable
> assignments in the current system and insteaf having the back end
> use them as the values to insert into the DB.
> 
> It works.  Kinda.  Its not pretty.  The reliance on PHP as an
> intermediate layer should be removed (slightly messy as MHonArc
> insists on inserting HTML-style comments), Proper thread handling
> and generation needs to be improved (Shouldn't reluy on MHonArc but
> should be dynamically generated).  etc.


My way of thinking, of having it designed, is that Mailman (using
Python) directly dumps the e-mail messages into the database.
(Are there standard [or defacto standard] Python modules for accessing
databases?... For accessing MySQL and PostgreSQL?)

Then, standard PHP (and whatever other languages) bindings/libraries,
to the database, can be provided.  That way, the database is the middle man.
And Mailman, and the PHP binding/library (and any other language
binding/library) only depend on the database.  (And better still, Mailman
is completely independent of the PHP [and vice versa].  Only  the database
structure matters.)


(To get a little deeper into the design...) the important things I see,
to extract from each message (and also store), is:

      The author of the message.  (This will probably be based on the
      e-mail address.  But, IMO, it would be better to design it so
      a person/author is thought of as a seperate entity from an
      e-mail address.  That way, a person/author could have more than
      one e-mail address, and still be recognized as the same person/author.
      There is one problem though... what happens if more than one person
      uses an e-mail address?)

      The message (or possiblely messages) that the e-mail message
      is a response to.

      The mailing list (or mailing lists) that it was sent to.

      The date & time it was received by the mailing list.

      The date & time it was (suppose) to have been sent.  (Although
      this can be inaccurate when someone does not set their clock
      correctly,... as I understand it anyways.)

      The subject of the message.

Other `things' that might want to be stored.  (Maybe for statistical
reasons.  Maybe for other reasons.)

      The delivery history of the of the e-mail message.

      All the other headers found in the message.

      If a message is sent from the web interface, maybe store stuff
      like the IP address of the sender, etc.


Does that sound reasonable?  Have I missed anything?  Your insight into
the workings of Mailman would be much appreciated.


See ya

      Charles Iliya Krempeaux