[Mailman-Users] Efficient handling of cross-posting
brad at shub-internet.org
Wed Jan 30 03:11:02 CET 2008
On 1/29/08, Mikhail T. wrote:
> May I suggest, you underestimate the importance of this feature?
> may often be justified from the end-user perspective, but is discouraged by
> the admins exactly because it increases the archival-storage requirements...
I've never once heard an admin discourage cross-posting because of
archival storage requirements. In my experience, the issue of
cross-posting has much more to do with political control, some admins
not wanting their lists to be well-publicised, and no admin liking to
have to deal with the problem when hundreds of users from other lists
try to do a "reply-all" that includes their lists and then the
message gets rejected or put in the hold queue because those other
people are not subscribers to the list.
> Brad, I brought up a particular IMAP-server's implementation as /an example/
> of how a single message can appear in multiple mailboxes, while only copy of
> it is stored. You refer to this as "single instance store".
We don't really have mailboxes at all. We have mail archives. The
raw mail archives are kept in 7th edition "mbox" format, and for the
"cooked" archives they are broken down by month (or other archive
rotation policy as set by the listowner) and either stored as
something akin to 7th edition mbox format files (for the plain text
archives) or split up into multiple *.html files for the HTML format
archives, but in none of these cases are any of these files what you
would call a "mailbox" per se.
> IMAP-server developers are just more affected by the same issue -- people
> CC-ing multiple addressees results in the same message getting to multiple
> mailboxes. IMAP-server admins also don't have the "luxury" of prohibiting
> CC-ing, as mailing-list admins often do. So IMAP-servers already implement
> the "single instance store", and it would be nice (and logical) if mailing
> list software did too -- starting with the recognized leader of the pack...
UW-IMAP certainly doesn't do single instance store, and I'm pretty
sure that Courier-IMAP and Dovecot don't do single instance store by
default. There's a lot of problems that come along with single
instance store that people are not likely to turn on such features by
> And yet Google does just that -- de-duplication -- in its search
> will display a warning at the bottom of the page, saying that duplicate
> results were suppressed...
That's just search results. They're not actually storing the
original copies of those objects, and they give you the option of
turning off that feature if you like.
That's completely different from doing an Internet-wide
de-duplication of all data.
> Well, this is more important -- I was under the (mistaken)
>impression, that it
> does. There is no point arguing, how a good search-engine should do things on
> a Mailman forum, if Mailman implements no search function.
We don't do forums, either.
We do provide hooks that other people have used to implement such
features, but none of that has been incorporated into the baseline
version of Mailman.
> I hope, you'll give the idea of "single instance storage" another thought.
> There is already an option to archive in "Maildir" format. Optionally storing
> hardlinks instead of copies of cross-posts can't be too difficult...
I believe you'll find it a lot harder than you think to convert the
entire archive storage mechanism to use Maildir as an option, and
then to integrate single instance store on top of that. Once you do
that, you're welcome to contribute the code, and then it becomes a
matter of when one of the core developers can take a look at that
code and decide whether or not to actually incorporate that into a
future version of Mailman.
Personally, I think we have much higher priorities elsewhere, but
then I don't assign tasks to guys like Mark Sapiro, Tokio Kikuchi, or
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>
More information about the Mailman-Users