Fix for bug "incoming/64"--the prefix replication problem
Hi all. I just transferred all of my majordomo lists over to Mailman and I must say, it is quite nice. There are a few things I'd like to see in the future, but they would just be icing on the cake. There is only one actual problem I've had. When someone replies to a message and his program prepends "Re: " to the subject, Mailman then adds a NEW list prefix to the subject line every time, even if there is already one somewhere in the subject line, so we end up with something like this:
Subject: [MyList] Re: [MyList] Re: [MyList] A few observations
After a while, the subject lines accumulate quite a few of those prefixes, which I think is a BIG problem.
I was looking at the code in the MailList.py program, and it looks like the code only checks to see if the prefix is already present AT THE BEGINNING of the line before it adds the prefix, thus ignoring the fact that it might be there after a "Re: " or "Fwd: ". Here are the relevant lines from MailList.py:
1267: # Prepend the subject_prefix to the subject line. 1268: subj = msg.getheader('subject') 1269: prefix = self.subject_prefix 1270: if not subj: 1271: msg.SetHeader('Subject', '%s(no subject)' % prefix) *1272: elif prefix and not re.match(re.escape(self.subject_prefix), *1273: subj, re.I): 1274: msg.SetHeader('Subject', '%s%s' % (prefix, subj))
The simplest solution to this problem is to replace the "re.match" in line 1272 with "re.search". The difference between the two is that re.match only matches the pattern if it occurs at the BEGINNING of the line, whereas re.search will match the pattern anywhere in the line. I've already tried this in my installation and it works quite well. In this case, after someone sends a message with a subject something like this...
Subject: [MyList] Hello there
...all of the following replies, no matter how many there are, will never add another [MyList] prefix because it is already present SOMEWHERE in the subject. Thus all subsequent replies will have the following as their subject:
Subject: Re: [MyList] Hello there
Now that would be sufficient to at least fix the problem of "subject prefix replication". However, I still am not satisfied with the result because now the prefix is not the FIRST thing in the subject. Being a graphic designer, I am particularly sensitive to the visual aspect of all of this, and to me it would be much more aesthetically pleasing to have the list name tags all lined up in my inbox list of messages, rather than some of them offset four places from the left edge.
One way to get around this would be to use an "re.sub" method to actually strip out all occurances of the list tag in the subject line before adding another one to the beginning. That would be great except for one thing: that would lead to the build up of "Re: "s! After a while you would end up with:
Subject: [MyList] Re: Re: RE: Re: Hello there
So, what you have to do is remove all of the extra "Re: "s as well. Which brings me to another problem: "RE: and FWD: replication", which is a big problem with email in general (especially with all this damn chain-letter spam!). And since the email programs aren't doing their job by cleaning up that crapola, why not have the mailing list manager do it? I wrote a mailing list archive program a while back which does this and it really helps with "threading" the messages based on the subject lines. (In fact, I store the subject without ANY 'Re:'s or 'Fwd:'s so that the subjects will be exactly the same, and I save the fact that it is a reply in a separate field. You can check out the archive if you like at www.LouisvilleTimes.org/harmonet/). Anyway, this would make the following subject line...
Subject: Re: [MyList] Re: Fwd: [MyList] Re: Hello there (fwd)
...end up looking like this:
Subject: [MyList] Re: Hello there
The result is that your list messages always have readable, comprehendable subjects that don't take 5 minutes to decode, AND your list archives are much more functional and easier to browse because messages which REALLY have the same subject won't be bogged down with 'Re:'s and 'Fwd:'s.
This is the basic procedure:
- First, with a regex substitution, strip the subject line of all occurances of the list's subject prefix.
- Then, using a case-insensitive regex, check to see if the subject BEGINS with "fw: " or "fwd: ", or if it ends with a "(fwd)". If so, set a second prefix variable (prefix2) to "Fwd: " to be added after the first prefix (the one from the list).
- Next, check to see if the subject line begins with an "re: ". If it does, this takes precedence over the "Fwd: ", so just go ahead and set prefix2 to "Re: " even if it already contains "Fwd: "
- Next, with a case-insensitive regex substitution, remove ALL instances of "re: ", "fw: ", "fwd: ", and "(fwd)" from the subject.
- Finally, add both the list prefix and the new prefix2 to the beginning of the subject.
So, here's a replacement for the segment of code that I quoted earlier from the MailList.py program which accomplishes this task quite well (I'm using it in my Mailman installation right now):
# Prepend the subject_prefix to the subject line.
subj = msg.getheader('subject')
prefix = self.subject_prefix
# If there's no subject, give it one
if not subj:
msg.SetHeader('Subject', '%s(no subject)' % prefix)
else:
# Delete all occurances of the prefix from the subject
if prefix:
subj = re.sub(re.escape(self.subject_prefix), '', subj)
# Check to see if the message is a reply or forward
prefix2 = ''
if re.search('^fwd*: |\(fwd\)', subj, re.I):
prefix2 = 'Fwd: '
if re.match('re: ', subj, re.I):
prefix2 = 'Re: '
# Clean up all that 're:' and 'fwd: ' garbage
subj = re.sub('[rR][eE]: | *\(*[fF][wW][dD]*\)*:* *', '', subj)
# Set the new subject line in place
msg.SetHeader('Subject', '%s%s%s' % (prefix, prefix2, subj))
Give it a try for a while on your own mailing lists. I think you'll like the results.
Cheers. -- Dan
Dionysos@Dionysia.org Daniel G. Delaney www.Dionysia.org/~dionysos/ PGP Public Key: /~dionysos/pgp.html
"DD" == Dan Delaney <Dionysos@Dionysia.org> writes:
DD> The simplest solution to this problem is to replace the
DD> "re.match" in line 1272 with "re.search".
Harald already made this change for 1.0rc3.
The rest sounds interesting, but I probably it'll definitely have to wait until after 1.0 final.
-Barry
participants (2)
-
Barry A. Warsaw
-
Dan Delaney