[Mailman-Developers] Random HTML archiving failures possibly solved
Georg Mischler
schorsch@schorsch.com
Mon, 18 Dec 2000 12:40:31 -0500 (EST)
Hi all,
There have been a number of reports about the HTML archiving
to fail misteriously, which were apparently impossible to
reproduce for the experts. I think I have just found a bug in
Mailbox.py from 2.0 that can cause this behaviour. Since I'm CVS
challenged, I am unable check if it has already been fixed since
then, but here it goes anyway.
The pattern that checks for the unix style "From " lines
fails when it encounters a negative timezone:
_fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'
The consequence is, that when a mailbox file has a message from
such a timezone at the beginning, then Mailman will think it
contains no messages at all. A more robust approach (assuming
that a plus sign in front of the timezone is also legal) would
probably look similar to this:
_fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+[+-]?\d\d\d\d\s*$'
At least this fixes the problem on my system here...
On another thought, wouldn't it be even better to use
rfc822.parsedate_tz() here as well? I realize this implies
some processing overhead, but I'd prefer robustness before
the last two percent of increased performance.
Have fun!
-schorsch
--
Georg Mischler -- simulations developer -- schorsch at schorsch.com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/