[ mailman-Feature Requests-1059566 ] Message number assignment during archiving with pipermail...

SourceForge.net noreply at sourceforge.net
Wed Nov 3 16:33:49 CET 2004


Feature Requests item #1059566, was opened at 2004-11-03 15:33
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=350103&aid=1059566&group_id=103

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brad Knowles (shub)
Assigned to: Nobody/Anonymous (nobody)
Summary: Message number assignment during archiving with pipermail...

Initial Comment:
Folks,

Currently, pipermail will generate a number for each message as it 
is processing the archive, including importing mbox-format 
archives from other sources.  Unfortunately, the numbers are 
specific to each installation, so importing an archive from 
somewhere else causes the message numbers after import to be 
different.

Among other things, this makes it difficult to move a list from one 
server to another and to set up a simple redirect from the old 
location to the new one, since the message numbers could well be 
different between the two sites.


It would be really, really nice if pipermail would use something like 
an md5 hash of the message headers to generate a unique archive 
id that could be used instead, so that the id could be consistent 
across systems.

Since the "Message-id:" header should be unique for every 
message, and the date/time stamps and queue-ids used by the 
various servers (and logged within the "Received:" headers) will 
almost certainly be slightly different for every message, using an 
md5 hash should give you a good guarantee that the output 
pipermail archive id would likewise be completely unique.


You could use the lower 16 hex characters of the 128-bit/32-
character md5 hash that is typically generated, and your 
probability of collisions between any two messages will be 
vanishingly small in the resulting 64-bit space.

If you're concerned about filename length, represent the data in 
base64 format (6 bits per ASCII character instead of four), and get 
the whole 128-bit hash compressed down to 22 output characters.  
You could then take a smaller slice and still get more bits of hash 
output.


If you don't like md5 for personal reasons, then maybe sha-1?


But please, whatever is used, please, please, please let it be 
something that could be derived from the headers of the messages 
themselves and guaranteed to be consistent across systems.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=350103&aid=1059566&group_id=103


More information about the Mailman-coders mailing list