Random HTML archiving failures possibly solved
Hi all,
There have been a number of reports about the HTML archiving to fail misteriously, which were apparently impossible to reproduce for the experts. I think I have just found a bug in Mailbox.py from 2.0 that can cause this behaviour. Since I'm CVS challenged, I am unable check if it has already been fixed since then, but here it goes anyway.
The pattern that checks for the unix style "From " lines fails when it encounters a negative timezone:
_fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+'
r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'
The consequence is, that when a mailbox file has a message from such a timezone at the beginning, then Mailman will think it contains no messages at all. A more robust approach (assuming that a plus sign in front of the timezone is also legal) would probably look similar to this:
_fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+'
r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+[+-]?\d\d\d\d\s*$'
At least this fixes the problem on my system here...
On another thought, wouldn't it be even better to use rfc822.parsedate_tz() here as well? I realize this implies some processing overhead, but I'd prefer robustness before the last two percent of increased performance.
Have fun!
-schorsch
-- Georg Mischler -- simulations developer -- schorsch at schorsch.com +schorsch.com+ -- lighting design tools -- http://www.schorsch.com/
participants (1)
-
Georg Mischler