[Mailman-Developers] Fix for bug "incoming/64"--the prefix replication problem

Dan Delaney Dionysos@Dionysia.org
Sun, 11 Jul 1999 23:56:37 -0400 (EDT)


Hi all.
   I just transferred all of my majordomo lists over to Mailman and
I must say, it is quite nice. There are a few things I'd like to see
in the future, but they would just be icing on the cake. There is
only one actual problem I've had. When someone replies to a message
and his program prepends "Re: " to the subject, Mailman then adds a
NEW list prefix to the subject line every time, even if there is
already one somewhere in the subject line, so we end up with
something like this:

   Subject: [MyList] Re: [MyList] Re: [MyList] A few observations

After a while, the subject lines accumulate quite a few of
those prefixes, which I think is a BIG problem.

   I was looking at the code in the MailList.py program, and it
looks like the code only checks to see if the prefix is
already present AT THE BEGINNING of the line before it adds the
prefix, thus ignoring the fact that it might be there after a "Re: "
or "Fwd: ". Here are the relevant lines from MailList.py:

--------------------------------------------------------------------
 1267: # Prepend the subject_prefix to the subject line.
 1268: subj = msg.getheader('subject')
 1269: prefix = self.subject_prefix
 1270: if not subj:
 1271:     msg.SetHeader('Subject', '%s(no subject)' % prefix)
*1272: elif prefix and not re.match(re.escape(self.subject_prefix),
*1273:                              subj, re.I):
 1274:     msg.SetHeader('Subject', '%s%s' % (prefix, subj))
--------------------------------------------------------------------

The simplest solution to this problem is to replace the "re.match"
in line 1272 with "re.search". The difference between the two is
that re.match only matches the pattern if it occurs at the BEGINNING
of the line, whereas re.search will match the pattern anywhere in
the line. I've already tried this in my installation and it works
quite well. In this case, after someone sends a message with a
subject something like this...

   Subject: [MyList] Hello there

...all of the following replies, no matter how many there are, will
never add another [MyList] prefix because it is already present
SOMEWHERE in the subject. Thus all subsequent replies will have the
following as their subject:

   Subject: Re: [MyList] Hello there

   Now that would be sufficient to at least fix the problem of
"subject prefix replication". However, I still am not satisfied with
the result because now the prefix is not the FIRST thing in the
subject. Being a graphic designer, I am particularly sensitive to
the visual aspect of all of this, and to me it would be much more
aesthetically pleasing to have the list name tags all lined up in my
inbox list of messages, rather than some of them offset four places
from the left edge.

   One way to get around this would be to use an "re.sub" method
to actually strip out all occurances of the list tag in the subject
line before adding another one to the beginning. That would be great
except for one thing: that would lead to the build up of "Re: "s!
After a while you would end up with:

   Subject: [MyList] Re: Re: RE: Re: Hello there

So, what you have to do is remove all of the extra "Re: "s as well.
Which brings me to another problem: "RE: and FWD: replication",
which is a big problem with email in general (especially with all
this damn chain-letter spam!). And since the email programs aren't
doing their job by cleaning up that crapola, why not have the
mailing list manager do it? I wrote a mailing list archive program a
while back which does this and it really helps with "threading" the
messages based on the subject lines. (In fact, I store the subject
without ANY 'Re:'s or 'Fwd:'s so that the subjects will be exactly
the same, and I save the fact that it is a reply in a separate
field. You can check out the archive if you like at
www.LouisvilleTimes.org/harmonet/). Anyway, this would make the
following subject line...

   Subject: Re: [MyList] Re: Fwd: [MyList] Re: Hello there (fwd)

...end up looking like this:

   Subject: [MyList] Re: Hello there

The result is that your list messages always have readable,
comprehendable subjects that don't take 5 minutes to decode, AND
your list archives are much more functional and easier to browse
because messages which REALLY have the same subject won't be bogged
down with 'Re:'s and 'Fwd:'s.

   This is the basic procedure:

   1. First, with a regex substitution, strip the subject line of
      all occurances of the list's subject prefix.
   2. Then, using a case-insensitive regex, check to see if the
      subject BEGINS with "fw: " or "fwd: ", or if it ends
      with a "(fwd)". If so, set a second prefix variable (prefix2) 
      to "Fwd: " to be added after the first prefix (the one from
      the list).
   3. Next, check to see if the subject line begins with an "re: ".
      If it does, this takes precedence over the "Fwd: ", so just go
      ahead and set prefix2 to "Re: " even if it already contains "Fwd: "
   4. Next, with a case-insensitive regex substitution, remove ALL
      instances of "re: ", "fw: ", "fwd: ", and "(fwd)" from the
      subject.
   5. Finally, add both the list prefix and the new prefix2 to the
      beginning of the subject.

So, here's a replacement for the segment of code that I quoted
earlier from the MailList.py program which accomplishes this task
quite well (I'm using it in my Mailman installation right now):

--------------------------------------------------------------------
        # Prepend the subject_prefix to the subject line.
        subj = msg.getheader('subject')
        prefix = self.subject_prefix
        # If there's no subject, give it one
        if not subj:
            msg.SetHeader('Subject', '%s(no subject)' % prefix)
        else:
            # Delete all occurances of the prefix from the subject
            if prefix:
                subj = re.sub(re.escape(self.subject_prefix), '', subj)
            # Check to see if the message is a reply or forward
            prefix2 = ''
            if re.search('^fwd*: |\(fwd\)', subj, re.I):
                prefix2 = 'Fwd: '
            if re.match('re: ', subj, re.I):
                prefix2 = 'Re: '
            # Clean up all that 're:' and 'fwd: ' garbage
            subj = re.sub('[rR][eE]: | *\(*[fF][wW][dD]*\)*:* *', '', subj)
            # Set the new subject line in place
            msg.SetHeader('Subject', '%s%s%s' % (prefix, prefix2, subj))
--------------------------------------------------------------------

   Give it a try for a while on your own mailing lists. I think
you'll like the results.

   Cheers.
-- Dan
________________________________________________________________________
 Dionysos@Dionysia.org                                Daniel G. Delaney
 www.Dionysia.org/~dionysos/
 PGP Public Key: /~dionysos/pgp.html