[PYTHON META-SIG] Organized web archives of the PSA SIG mailing lists - review

Andrew Kuchling amk@magnet.com
Fri, 11 Oct 1996 18:11:38 -0400 (EDT)

>  - *If* it's easy to do, it'd be nice to have the archives subdivided
>    only when the size of the yearly collection exceeds a certain
>    threshold - say 200 messages.  (I really don't know what's
>    appropriate, but my suspicion is that 200 is not too big.)

	Ummm... think think... while it's feasible, I think it would
be a bit kludgy.  (I'll consider it further, though.)  If disk space
isn't too big a problem, why not index both _en masse_ and by

	A digression about how Pipermail works: the base pipermail.T
class handles formatting, and has abstract methods like
get_archives(A), which returns a list of archives where article A
should be filed.  Each archive is then a subdirectory.  get_archive()
has access to the article's headers (and even its body), so it can
make quite complex decisions.

	An article can be put in multiple archives; for example, we
could automatically put postings by Guido, or postings where the
subject line begins with "ANNOUNCE:", in a separate archive.  (Any
suggestions for such special archives?)

	Currently, a copy of the article is made in each archive
directory; my fuzzy reasoning behind this is that you might want
articles formatted differently depending on where they're going.
(Consider keeping a verbatim copy of postings, and an HTML-formatted
version.)  This will eat disk space quickly if articles are placed in
lots of different archives all the time.

	An alternative would be to have a single directory for
formatted articles, and each different archive would point into that
single repository.  This means we can't format articles differently
for each archive, but it's a lot easier on disk space.  

>    (Sectioning of the archives will be less disruptive when there is
>    an archive search interface, for which andrew is also seeking
>    comments.)
	One note: the search isn't available on www.magnet.com because
I can't run CGI scripts there.  I've prototyped a search using swish
on amarok, but it's hidden behind a firewall.  We can worry about that
after the archives are up.

	Another big problem: Python code's indentation gets mangled by
HTML formatting.  I'd like to magically recognize inclusions of code,
and add <PRE>...</PRE> around them.  Any suggestions for how to do
this fairly reliably?  I consider this critical for making the
archives usable.  (We could just always put the entire article inside
<PRE></PRE>, but that's ugly and not very readable.)

	Andrew Kuchling

META-SIG  - SIG on Python.Org SIGs and Mailing Lists

send messages to: meta-sig@python.org
administrivia to: meta-sig-request@python.org