[Tutor] Newbie - Simple mailing list archiver

Barnaby Scott bds@waywood.co.uk
Sat Mar 29 16:25:02 2003


First off, I have to confess to being a complete novice - apart from a
'Hello World' type of thing and a bit of reading, I am trying to write my
first ever Python script. However, the only way I have ever found of
learning anything is to jump in the deep end and try to do something that I
actually need to do, rather than half-heartedly wading through excercises
(which I'm sure would be good for the soul)!

I manage a mailing list and want to archive it, in a very basic form, to a
website - which happens to consist of a wiki (the engine for which is
PikiePikie - which I'm very impressed by and which is itself written in
Python). The great thing about this excercise is that the wiki uses only
text files, and the HTML is generated on the fly, so my archive will be all
plain text.

Below I have sketched out a skeleton for what the archiver needs to do, as I
see it. It would be a bit cheeky just to come here and ask someone to fill
in all the Python code for me! However, I would be extremely grateful for 2
things:
1: Any comments about the strategic 'skeleton' I have laid out
2: As much or as little of the code as anyone feels able/inclined to fill
in. It's not that I'm lazy, it's just that when you are starting literally
from scratch, it is really hard to know if you are getting EVERYTHING
completely wrong, and wasting time disappearing up blind alleys. Even
incredibly simple-sounding things sometimes take weeks to discover unless
you are shown the way!

As I say, ANY amount of guidance would be gratefully received. Here's my
skeleton...

#Read mail message from STDIN

#Get these header values:
#     From
#     Subject (or specify 'no subject')
#     Message-ID
#     Date (in a short, consistent format)
#     In-Reply-To, or failing that the last value in References, if either
exist
#     Content-Type

#If Content-Type is multipart/*, get only the body section that is
text/plain
#Else if Content-Type is text/plain, get the body text
#Else give up now!

#Open my 'messageIDs' file (a lookup file which stores, from previous
messages, the value pairs: An integer messageID, original Message-ID)

#Find there the highest existing integer messageID and generate a new one by
adding 1

#Append to the 'messageIDs' file:
#    Our newly generated integer messageID, this message's Message-ID

#Open my 'ArchiveIndex' file (a wiki page which lists all messages in
threads, with a specific method of indenting)

#If there is an In-Reply-To or References value
#    Look in the 'messageIDs' file for this Message-ID and return that
message's corresponding integer messageID
#    Look in the 'ArchiveIndex' file to find this integer at the beginning
of a line (save for preceding spaces and '*')
#    Add a new line immediately after this line we have found
#    Pad it with one more leading space than the preceding line had and...
#Else
#    Add a new line at beginning of file
#    Pad it with 1 leading space and...

#...now write on that line:
#    '*' integer messageID + ': ' + Subject + ' ' + From + ' ' + Date + '
['ArchivedMessage' + integer messageID + ' ' + 'View]'

#Close the 'ArchiveIndex' and 'messageIDs' files

#Create and open a new file called 'ArchivedMessage' + integer messageID

#Write to this file:
#    From
#    Subject
#    Date
#    plain text body

#Close the 'ArchivedMessage?' file