[Mailman-Developers] [PATCH] Port HyperArch/pipermail to mimelib

Fri, 12 Oct 2001 16:39:25 -0400

>>>>> "BG" == Ben Gertzfield <che@debian.org> writes:

    BG> Here's a port of HyperArch and pipermail to mimelib.  This
    BG> allows proper parsing of multipart messages, and will make
    BG> i18n handling much easier.  This is a big step forward, I
    BG> think, because now we no longer have two very different
    BG> Message classes in Mailman.

I'm still looking at this patch.  I have some qualms about it.  If I
commit this patch, we'll need to further do the mimelib->email
conversion, but that shouldn't be hard.

First...

    BG> This also patches pythonlib/mailbox.py to use mimelib instead
    BG> of rfc822.  This is the last use of rfc822 in Mailman, so we
    BG> can now remove pythonlib/rfc822.py completely from the
    BG> archives -- now we use mimelib entirely!

It also modifies pythonlib/cgi.py to use mimelib.  Neither are good
ideas because it means our copies get farther out of sync with
Python's and we'll always have to carry around our copies.

The purpose of the Mailman/pythonlib directory is to allow us to defer
requiring newer versions of Python.  Right now, Mailman should work
with Python 2.0, but some of the modules that have been patched since
then have useful stuff we need now.  So I put copies of the latest
standard library files in Mailman/pythonlib as a form of forward
compability.  Eventually, I can remove these once I require a version
of Python that has these patches in them.

An example is Cookie.py.  When MM required only Py1.5.2, I had to
provide a Cookie.py, but because Py2.0 has its own Cookie.py, we can
use that and forget about our own copy.  Similarly with cgi.py,
rfc822.py, and others (I do need to do a bit of cleaning up here
though).

Fortunately, I think your changes to cgi.py aren't necessary, and we
can accomplish your mailbox.py changes by changing Mailman/Mailbox.py
instead.  We do still need rfc822.py (I think) because email/mimelib
package in some cases just wraps rfc822.py code instead of
reimplementing or cutting-and-pasting the source.

    BG> This patch depends on the mimelib patch I just sent; it uses
    BG> the get_decoded_payload() function I added to get a nice text
    BG> representation of even a multi-part message.  This will let us
    BG> even display a message for non-text parts of a message, and
    BG> eventually will let HyperArch display attachments inline.  And
    BG> of course, as I mentioned in my previous mail, this will
    BG> prevent base64 gobbeldygook from showing up in the archives.

    BG> This patch even deals with multiple text/* attachments to a
    BG> message, and will include them all in the archive even if
    BG> they're base64 or quoted-printable encoded.

I think this is a decent patch, and I'm probably going to commit
these, after I rewrite them for the email package.

    BG> It currently does not deal with replacing high-ASCII
    BG> characters with HTML entities in HyperArch; I'm going to deal
    BG> with that next by taking the htmlentitydefs module's hash,
    BG> inverting it, and using that as a big global
    BG> search-and-replace, if the charset is undefined or iso-8859-1.

My biggest question here is why you took most of the code out of
Article._get_body() in HyperArch.py.  IIRC, Jeremy added all this
stuff so that charset handling would be saner.  The idea is that if
there is a single charset for the message, that would be the charset
used for the web archive page.  But if the page had multiple charsets,
then it would pick the most common one.  AFAIK, there's no way to
represent multiple charsets in a single HTML page.  An example of the
latter is an index page for a list that has Subject: fields with many
different charsets.  Which one do you pick?

In your patch, it seems like everything comes out iso-8859-1, and that
doesn't seem right.

-Barry