Efficient handling of cross-posting
![](https://secure.gravatar.com/avatar/76f20a032cb19fae481351e591ede4ea.jpg?s=120&d=mm&r=g)
Hello!
Our site uses mailman to host many mailing lists. Sometimes a person would find it necessary to post to several of the lists at once incurring irritation of the subscribers for the following reasons:
- People subscribed to more than one list will get the same message multiple times.
- The same message will be stored in multiple archives increasing their size.
- The same message will come up multiple times in subsequent searches through the archives.
Can our mailman installation be tweaked to eliminate all or some of the above undesirable effects? Thanks,
-mi
![](https://secure.gravatar.com/avatar/7bdecdef03708b218939094eb05e8b35.jpg?s=120&d=mm&r=g)
On 1/28/08, Mikhail T. wrote:
Can our mailman installation be tweaked to eliminate all or some of the above undesirable effects? Thanks,
Not without source-code level modifications, no.
The elimination of duplicates being sent to individuals is something that might be implemented in a future version of Mailman, but that would require that each list have complete knowledge of who all the overlapping subscribers are for all the other lists that are known recipients of the message. And that would still break down if the sender created one message and sent it to one list, then took the same message and sent it separately to another list, etc....
Moreover, the list can't know that fredjbloggs@example.com is the same person as fred@example.net on another list, although future versions of Mailman will allow people the option of registering multiple different e-mail addresses that are all associated with the same identity, so if they choose to make use of that function, you would at least have a better chance.
Either way, I wouldn't look for these features to arrive before the mythical Mailman3 that we occasionally hear about.
OTOH, I'm not sure that this is something that should ever be the responsibility of the mailing list software. The complete suppression of duplicates is something that can only be done conclusively by the receiver, and not the sender.
As for the rest, the message was sent to multiple lists, and therefore it should definitely show up in multiple archives. I don't think that you're going to find any way around that one.
-- Brad Knowles <brad@shub-internet.org> LinkedIn Profile: <http://tinyurl.com/y8kpxu>
![](https://secure.gravatar.com/avatar/56f108518d7ee2544412cc80978e3182.jpg?s=120&d=mm&r=g)
Brad Knowles wrote:
On 1/28/08, Mikhail T. wrote:
Can our mailman installation be tweaked to eliminate all or some of the above undesirable effects? Thanks,
Not without source-code level modifications, no.
In Mailman 2.1.10, there is a new sibling lists feature that can be used to reduce or eliminate duplicates sent to members of multiple cross-posted lists.
You can for example put listb@example.com in the regular_exclude_lists attribute of lista@example.com. Then, if a post addresses both lista@example.com and listb@example.com, those addresses which are regular members of both lista@example.com and listb@example.com with delivery enabled will not be sent the post from lista@example.com.
The rest of Brad's reply is right on.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
![](https://secure.gravatar.com/avatar/76f20a032cb19fae481351e591ede4ea.jpg?s=120&d=mm&r=g)
понеділок 28 січень 2008 02:15 по, Brad Knowles Ви написали:
The elimination of duplicates being sent to individuals is something that might be implemented in a future version of Mailman, but that would require that each list have complete knowledge of who all the overlapping subscribers are for all the other lists that are known recipients of the message.
Yes, of course. But a single mailman installation hosting multiple mailing lists already has the complete knowledge, does not it?
And that would still break down if the sender created one message and sent it to one list, then took the same message and sent it separately to another list, etc....
Yes, absolutely -- the only key is Message-Id. Even if a hash of the message body /could/ be used as the key, I think, a different Message-Id means, the message should be sent again.
although future versions of Mailman will allow people the option of registering multiple different e-mail addresses that are all associated with the same identity, so if they choose to make use of that function, you would at least have a better chance.
I think, this function is already here. When I first sent message to this list, it bounced, because I was not a subscriber. The bounce suggested, that I subscribe ALL of my addresses and mark some of them as "NOMAIL". This would seem to indicate, that multiple addresses-per-person feature is already established.
OTOH, I'm not sure that this is something that should ever be the responsibility of the mailing list software. The complete suppression of duplicates is something that can only be done conclusively by the receiver, and not the sender.
This is true. But if /most/ duplicates are eliminated by this, then the remaining /few/ may be acceptable to allow cross-posting to relevant mailing lists.
As for the rest, the message was sent to multiple lists, and therefore it should definitely show up in multiple archives. I don't think that you're going to find any way around that one.
AFAIU, the message will appear in the search results multiple times -- once per mailing list. That is not justified -- the results should contain no repetitions...
Yours,
-mi
![](https://secure.gravatar.com/avatar/56f108518d7ee2544412cc80978e3182.jpg?s=120&d=mm&r=g)
Mikhail T. wrote:
Yes, of course. But a single mailman installation hosting multiple mailing lists already has the complete knowledge, does not it?
See my reply to Brad's reply.
<snip>
I think, this function is already here. When I first sent message to this list, it bounced, because I was not a subscriber. The bounce suggested, that I subscribe ALL of my addresses and mark some of them as "NOMAIL". This would seem to indicate, that multiple addresses-per-person feature is already established.
This is not true 'multiple addresses per person'. This is just a bunch of addresses, some of which don't receive list mail. Mailman currently has no way to know which if any of these addresses belong to the same user.
The concept to be implemented in Mailman 3.0 is a separate user database which has an entry per person with perhaps multiple email addresses and various roles such as member of list1, owner of list2 and moderator of list3
<snip>
AFAIU, the message will appear in the search results multiple times -- once per mailing list. That is not justified -- the results should contain no repetitions...
For as many people who feel as you do, I'll wager that there are many more who feel it is absolutely wrong to not archive a post to lista in lista's archive just because it also was possibly archived in listb's archive.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
![](https://secure.gravatar.com/avatar/173371753ea2206b9934a9be1bdce423.jpg?s=120&d=mm&r=g)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jan 28, 2008, at 5:16 PM, Mark Sapiro wrote:
The concept to be implemented in Mailman 3.0 is a separate user database which has an entry per person with perhaps multiple email addresses and various roles such as member of list1, owner of list2 and moderator of list3
Without too deep in the technical weeds, this exists in Mailman 3.0
right now: there are users, addresses, and members. An address is
just an email address, which can have state like "validated" and can
be associated with a user. A user is a person to Mailman, with a
name, possibly preferences, and multiple associated addresses. A
member is a user who is associated with a mailing list under a
particular role (or multiple roles), such as "subscribed list member",
"administrator", or "moderator", with a particular address.
There's an intermediate concept not directly represented at the
database layer called a "roster" which is just a set of members. So
when Mailman delivers a message to a mailing list for example, it asks
for the roster of all enabled subscribed member addresses, and it send
the message to them.
What this means is that in Mailman 3.0, there is knowledge of
subscriptions across mailing lists, so that we could do better cross-
posting, though this isn't implemented yet. For example, you could
say that the 'musicians' mailing list roster is composed of the
rosters for the 'guitar-players' mailing list and the 'bass-players'
mailing list, plus a bunch of directly subscribed multi-
instrumentalists. Mailman figures all that out when it decides who
the recipients of the message are.
<snip>
AFAIU, the message will appear in the search results multiple times
-- once per mailing list. That is not justified -- the results should
contain no repetitions...For as many people who feel as you do, I'll wager that there are many more who feel it is absolutely wrong to not archive a post to lista in lista's archive just because it also was possibly archived in listb's archive.
How the message is accessible is up to the archiver used, and
archivers will be (more easily) pluggable in Mailman 3. Pipermail
will probably not be made any smarter unless a volunteer comes forward
to make it so, but it would be easier to plug in MHonArc or even mail-
archive.com/gmane support. It's then up to the specific archiver
whether it will reduce duplicates by comparing Message-Ids. I'm not
aware of a free archiver (or free archiving service) that does this
though.
Cheers,
- -Barry
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin)
iD8DBQFHnmfh2YZpQepbvXERAqyCAJ9JvikrbPA8rW8L9DPKlcVbZlWJUgCfWUji RY/N4wIOtV+9edsYRsbkVGM= =PXXk -----END PGP SIGNATURE-----
![](https://secure.gravatar.com/avatar/53ab4254d62e5bfe72ae2b74e30d67fb.jpg?s=120&d=mm&r=g)
On Jan 28, 2008, at 15:40, Barry Warsaw wrote:
What this means is that in Mailman 3.0, there is knowledge of subscriptions across mailing lists, so that we could do better cross- posting, though this isn't implemented yet. For example, you could say that the 'musicians' mailing list roster is composed of the rosters for the 'guitar-players' mailing list and the 'bass-players' mailing list, plus a bunch of directly subscribed multi- instrumentalists. Mailman figures all that out when it decides who the recipients of the message are.
This is quite interesting. I don't believe in duplication and as such
not in cross-posting or even better cross-posting, and so the above is
interesting to me because I admittedly was thinking along these lines,
as I started reading this thread.
So I enjoy the roster concept which is outlined. Still, it seems odd
to me that the list server software can adequately decide on the
process of eliminating duplicates. To me, the roster concept implies
that duplicates should not have been sent by list members in the first
place.
In other words, a proper roster structure discourages any need for
cross-posting. It's all about expectation, I think. The expectation
for instance that if a guitar-players list exists, a guitar-players
discussion should not take place, just or too, on the general
musicians list.
But when the expectation is not in place, an approach such as a list
server's elimination of duplicates appears to an awkward uphill
battle. In short, I wonder if the suggested feature will do more harm
than good.
Mikael
![](https://secure.gravatar.com/avatar/334b870d5b26878a79b2dc4cfcc500bc.jpg?s=120&d=mm&r=g)
Mikael Hansen writes:
So I enjoy the roster concept which is outlined. Still, it seems odd
to me that the list server software can adequately decide on the
process of eliminating duplicates. To me, the roster concept implies
that duplicates should not have been sent by list members in the first
place.
As I pointed out elsewhere (on the mailman-developers list) this is just reinventing Usenet as a push medium. Barry has already asked that this conversion go there; Reply-To set.
However, it is often the case that even when themes apparently nest (as "guitarist" nests into "musicians"), some people are interested in a collection of "broad" threads from "musicians", some people want a collection of "narrow" threads from "guitarists", and there is often an overlap where it's not obvious whether it is of narrow interest or broad interest, eg a new technique for guitarists that might have been known by cellists, or maybe trombonists -- you don't know -- since antiquity. Cross-posting can't be eliminated; you can only help users to recognize and organize the duplication.
![](https://secure.gravatar.com/avatar/7bdecdef03708b218939094eb05e8b35.jpg?s=120&d=mm&r=g)
On 1/28/08, Mikhail T. wrote:
Yes, of course. But a single mailman installation hosting multiple mailing lists already has the complete knowledge, does not it?
Nope. All you have for each list knows is a bunch of e-mail addresses that are subscribed. When sending out a message that has been cross-posted to multiple lists, one thing you could do is an equivalent to "sort | uniq" for all of the recipient e-mail addresses, but you have no way of know if a single person has multiple different addresses that are subscribed to one or more lists.
And I believe that interactive mail messages are handled separately from digests.
Yes, absolutely -- the only key is Message-Id. Even if a hash of the message body /could/ be used as the key, I think, a different Message-Id means, the message should be sent again.
Message-id is not really good enough. There have been many examples of clients that do not create sufficiently unique message-ids for different messages.
But if the recipient wants to that that risk within their mail system, that's their choice.
I think, this function is already here. When I first sent message to this list, it bounced, because I was not a subscriber. The bounce suggested, that I subscribe ALL of my addresses and mark some of them as "NOMAIL". This would seem to indicate, that multiple addresses-per-person feature is already established.
Uh, no. I think I may have written that bounce message. I guarantee you that this feature does not yet exist in Mailman.
This is true. But if /most/ duplicates are eliminated by this, then the remaining /few/ may be acceptable to allow cross-posting to relevant mailing lists.
Most of the anti-duplication features can't be delivered until Mailman3. The sister-list concept that Mark has introduced with Mailman 2.1.10 is the best we're likely to be able to see for a long time.
AFAIU, the message will appear in the search results multiple times -- once per mailing list. That is not justified -- the results should contain no repetitions...
Mailman does not incorporate any search function, therefore which searches return which messages is totally and completely irrelevant to Mailman.
Moreover, searches across multiple lists should most definitely return multiple hits for the same message, if it was posted to multiple lists. If you want any other kind of behaviour, then that would be up to you and how you configure your particular search query.
No search engine author in their right mind should ever consider doing de-duplication on their own, although they might be willing to provide that feature to customers who demand the option.
-- Brad Knowles <brad@shub-internet.org> LinkedIn Profile: <http://tinyurl.com/y8kpxu>
![](https://secure.gravatar.com/avatar/56f108518d7ee2544412cc80978e3182.jpg?s=120&d=mm&r=g)
Brad Knowles wrote:
Most of the anti-duplication features can't be delivered until Mailman3. The sister-list concept that Mark has introduced with Mailman 2.1.10 is the best we're likely to be able to see for a long time.
Actually, Tokio deserves the credit for bringing the concept and implementation to Mailman.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
![](https://secure.gravatar.com/avatar/7bdecdef03708b218939094eb05e8b35.jpg?s=120&d=mm&r=g)
Google does not provide all the storage for all the content in
question, although they may have a cache of it.
-- Brad Knowles <brad@shub-Internet.org>
Sent from my iPhone
On Jan 29, 2008, at 1:21 AM, "Stephen J. Turnbull"
<stephen@xemacs.org> wrote:
Brad Knowles writes:
No search engine author in their right mind should ever consider doing de-duplication on their own, although they might be willing to provide that feature to customers who demand the option.
Google does.
participants (6)
-
Barry Warsaw
-
Brad Knowles
-
Mark Sapiro
-
Mikael Hansen
-
Mikhail T.
-
Stephen J. Turnbull