
Hi!
I have a list that has to have its archives rebuilt. The problem is that the first two years of archives do not exist in mbox format, as someone trimmed the mbox file way back when. This portion of the archives only exists in the regular text files, that the Mailman archiver created.
I tried appending those two years of files together and changing the initial "From user@foo.bar" to "^N^NFrom user@foo.bar" to get Mailman to see the files as mbox format files. My local unix mail program (mailx) sees the file as having 2200 or so messages for these two years, but the archiver does not. It only sees two messages in total from it.
When I concatenated the old file with the current mbox file (the old archives first ;-) It sees the two messages from the first part of the file and then correctly processes the current mbox part.
How can I either a) persuade the archiver to properly handle these messages or b) modify the pipermail.pck file to have it see these two years worth of archives and properly update the index file.
We are using Mailman 2.0b2. While I want to upgrade to 2.03, I'd rather not screw up too much at once.
Any advice would be appreciated.
reb

At 12:15 PM 3/26/2001 -0500, Phydeaux wrote:
For those with any interest in this, the problem seems to be that Mailman sees the break between messages a bit differently than most mail packages. A simple "^N^NFrom " isn't enough to convince Mailman that a new message has started. Instead it also looks for the date/time on the "From" line. Inside Mailbox.py is this nifty chunk of code that appears to cause this behaviour:
_fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'
I have no idea what the actual RFC for mbox format states, but for now I have solved the mbox format part of the problem...
Now, when I get to the HTML generation of I get this wonderful set of messages that indicates it has a problem with November 1999. I've looked at the messages and can't find anything different about them. October's index (and all the other files created) appears fine. When it gets to November 1999 I get this:
Updating HTML for article 3265 Updating index files for archive [1999-November] Traceback (innermost last): File "bin/arch", line 128, in ? main() File "bin/arch", line 118, in main archiver.close() File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 911, in close self.update_dirty_archives()# Update all changed archives File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 874, in update_dirty_archives self.update_archive(i) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 303, in update_archive parameters=self.__set_parameters(archive) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 224, in __set_parameters firstdate=self.database.firstdate(archive) File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 188, in firstdate self.__openIndices(archive) File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 237, in __openIndices self.__closeIndices() File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 259, in __closeIndices index.close() File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 167, in close fp.write(marshal.dumps(self.dict)) MemoryError
As always, any help/advice appreciated!
reb

"P" == Phydeaux <reb@taco.com> writes:
P> For those with any interest in this, the problem seems to be
P> that Mailman sees the break between messages a bit differently
P> than most mail packages. A simple "^N^NFrom " isn't enough to
P> convince Mailman that a new message has started. Instead it
P> also looks for the date/time on the "From" line. Inside
P> Mailbox.py is this nifty chunk of code that appears to cause
P> this behaviour:
>> 'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
>> r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'
So far, all correct, at least for Mailman 2.0.x.
P> I have no idea what the actual RFC for mbox format states, but
P> for now I have solved the mbox format part of the problem...
There is no RFC, just "standard" practices. The best description of the issue I've found is contained in this url:
http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html
Note that this is really a Python issue, since Mailman just uses the mailbox module to split the mbox up. I've actually refactored this code in Python 2.1 to be more conformant with the note above and by default Mailman 2.1 will use just 'From ' at the beginning of the line to separate messages (making it exactly '\n\nFrom ' is harder given the mailbox.py code, but the current implementation should be Good Enough).
P> MemoryError
Well, now this might be a different problem though. Pipermail via bin/arch slurps the entire archive into memory so if the archive is big you could have this problem. Have I mentioned that Pipermail could use a good rewrite? Volunteers are welcomed! :)
-Barry

At 12:15 PM 3/26/2001 -0500, Phydeaux wrote:
For those with any interest in this, the problem seems to be that Mailman sees the break between messages a bit differently than most mail packages. A simple "^N^NFrom " isn't enough to convince Mailman that a new message has started. Instead it also looks for the date/time on the "From" line. Inside Mailbox.py is this nifty chunk of code that appears to cause this behaviour:
_fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'
I have no idea what the actual RFC for mbox format states, but for now I have solved the mbox format part of the problem...
Now, when I get to the HTML generation of I get this wonderful set of messages that indicates it has a problem with November 1999. I've looked at the messages and can't find anything different about them. October's index (and all the other files created) appears fine. When it gets to November 1999 I get this:
Updating HTML for article 3265 Updating index files for archive [1999-November] Traceback (innermost last): File "bin/arch", line 128, in ? main() File "bin/arch", line 118, in main archiver.close() File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 911, in close self.update_dirty_archives()# Update all changed archives File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 874, in update_dirty_archives self.update_archive(i) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 303, in update_archive parameters=self.__set_parameters(archive) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 224, in __set_parameters firstdate=self.database.firstdate(archive) File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 188, in firstdate self.__openIndices(archive) File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 237, in __openIndices self.__closeIndices() File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 259, in __closeIndices index.close() File "/usr/local/mailman/Mailman/Archiver/HyperDatabase.py", line 167, in close fp.write(marshal.dumps(self.dict)) MemoryError
As always, any help/advice appreciated!
reb

"P" == Phydeaux <reb@taco.com> writes:
P> For those with any interest in this, the problem seems to be
P> that Mailman sees the break between messages a bit differently
P> than most mail packages. A simple "^N^NFrom " isn't enough to
P> convince Mailman that a new message has started. Instead it
P> also looks for the date/time on the "From" line. Inside
P> Mailbox.py is this nifty chunk of code that appears to cause
P> this behaviour:
>> 'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
>> r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'
So far, all correct, at least for Mailman 2.0.x.
P> I have no idea what the actual RFC for mbox format states, but
P> for now I have solved the mbox format part of the problem...
There is no RFC, just "standard" practices. The best description of the issue I've found is contained in this url:
http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html
Note that this is really a Python issue, since Mailman just uses the mailbox module to split the mbox up. I've actually refactored this code in Python 2.1 to be more conformant with the note above and by default Mailman 2.1 will use just 'From ' at the beginning of the line to separate messages (making it exactly '\n\nFrom ' is harder given the mailbox.py code, but the current implementation should be Good Enough).
P> MemoryError
Well, now this might be a different problem though. Pipermail via bin/arch slurps the entire archive into memory so if the archive is big you could have this problem. Have I mentioned that Pipermail could use a good rewrite? Volunteers are welcomed! :)
-Barry
participants (2)
-
barry@digicool.com
-
Phydeaux