Re: [Mailman-Developers] URGENT: Google Summer of Code status report and code due

Note: CC'ing Mailman Developers. I think this was private, so not trimming.
At the bottom there's a discussion of the newsgroup naming scheme, which is an important bikeshed to paint. If you're not interested in a bunch of "me too" comments, skipp right on down there!
Barry Warsaw writes:
Hi Alex.
On Jun 28, 2012, at 02:00 AM, Alexander Sulfrian wrote:
Currently I am using twisted-news-11.0.0, but that just because it is the current stable version in gentoo. Debian contains twisted-10.1.0 in stable and twisted-12.0.0 in testing and unstable. I would use 12.0.0 if you have no objections. Additional to that, I would like to continue to use the package manager version. This has the advantage that requisites are installed automatically.
I think this is fine. We've got 11.1.0 in Ubuntu 12.04 and 12.0.0 in Quantal. Debian Wheezy (now frozen) has 12.0.0. I can easily spin up Debian or Gentoo instances (it's been *years* since I ran Gentoo). Go with whichever of those will give you the easiest path to success and we should be able to reproduce your environment.
+1. I can also do both Debian and Gentoo easily. I would say stick to Gentoo for now.
I already looked into the code examples, that comes with twisted-news. As far as I could see, twisted-news is manly an reactor-based server, that is doing some string parsing to handle the protocol. There are some sample back ends that use pickle files, the twisted dirdbm or the twisted DB-API. But for the project it should be necessary to write a custom back end.
I expected a new backend might be necessary. Again, since this isn't the most important part of the project, do only what's necessary to show off the Mailman integration work.
+1
My first intention would be to implement it like a runner. The lmtp runner is also an daemon. It could be implemented in a similar way. But the nntp archive daemon is not a runner in the first place. So it maybe would be better to put it under the archiving module. It could be implemented like a runner. It should be possible to configure it via the configuration file like the other modules, that are available. But I do not know where it is preferred to put it.
Daemon/server control should be implemented as a runner,
+1. Unless some showstopper crops up, I think all Mailman daemons should be controlled via runners.
but it can be pretty simple. The IRunner interface really just needs to implement run() and stop() and even forking a subprocess should be fairly easy to do.
You have to worry about two things. First, you need a server to vend the messages through NNTP. Second, you need some way to get new messages into the twisted-news (or whatever) backend when they show up on the mailing list. This is the part that you'll need an IArchiver implementation for. Note that the only IArchiver method you *must* implement is archive_message(). You'll have to figure out whether you need the twisted-news server running or not for that.
IMO, make that *three* things. It ought to be possible to fire up the NNTP runner on an existing archive, and vend messages. I suppose this means a separate database for news properties like the message to message number mapping, and perhaps for newsgroup names.
Newsgroup names are an issue here. It seems to me that (if not gateway'd to Usenet) they should be something like (pseudo-code)
"mailman." + join(reverse(split(list-id,".")),".")
Eg, this list would be "mailman.org.python.mailman-developers". I know that's considered ugly-out-the-wazoo, but these need to be UUIDs (consider mirrors), and mailman@python.org should not be in the same subtree as mailman@python.net (ie,
mailman.net.python.mailman vs mailman.org.python.mailman
not
mailman.python.net.mailman vs mailman.python.org.mailman
The top-level maybe shouldn't be "mailman", but rather something like "list-archive".
Another issue with newsgroup names is that some lists *are* registered in the news hierarchy, so provision for such aliases should be allowed.
Let the bikeshed-painting begin!
The new code should be integrated into the existent test set. I know that a successful test is not sufficient, but it is a good start. Additionally to that, we should define some acceptance tests.
+1
I was going to write something here, so remind me. (Must run off to torture would-be grad students.)

On 5 Jul 2012, at 01:24, Stephen J. Turnbull wrote:
Eg, this list would be "mailman.org.python.mailman-developers". I know that's considered ugly-out-the-wazoo, but these need to be UUIDs (consider mirrors), and mailman@python.org should not be in the same subtree as mailman@python.net (ie,
mailman.net.python.mailman vs mailman.org.python.mailman
not
mailman.python.net.mailman vs mailman.python.org.mailman
The top-level maybe shouldn't be "mailman", but rather something like "list-archive".
Another issue with newsgroup names is that some lists *are* registered in the news hierarchy, so provision for such aliases should be allowed.
Let the bikeshed-painting begin!
OK. Where do these two email addresses sit?
foo@bar.example.com foo.bar@example.com
They're distinct addresses, but they clash in this naming scheme because of the collapse of (@,.) to (.).
If '@' can't be mapped to itself, or another unused character, then it needs to be mapped to a rarely used string, or it needs to be mapped to a configurable string. I suppose the default could be '.' if the string were configurable.
-- Ian Eiloart Postmaster, University of Sussex +44 (0) 1273 87-3148

Ian Eiloart writes:
OK. Where do these two email addresses sit?
Addresses aren't relevant. I proposed using List-Ids, which have to be unique (RFC 2919). If an administrator specifies List-Ids that collide, that's not our problem. (The author of RFC 2919 was aware of similar problems, though not this particular one AFAICS.)
Such an administrator will be able to work around it using whatever aliasing mechanism we develop for lists that are gatewayed to and from Usenet.

On Jul 05, 2012, at 09:24 AM, Stephen J. Turnbull wrote:
IMO, make that *three* things. It ought to be possible to fire up the NNTP runner on an existing archive, and vend messages.
+1
I suppose this means a separate database for news properties like the message to message number mapping, and perhaps for newsgroup names.
Newsgroup names are an issue here. It seems to me that (if not gateway'd to Usenet) they should be something like (pseudo-code)
"mailman." + join(reverse(split(list-id,".")),".")
Eg, this list would be "mailman.org.python.mailman-developers". I know that's considered ugly-out-the-wazoo, but these need to be UUIDs (consider mirrors), and mailman@python.org should not be in the same subtree as mailman@python.net (ie,
mailman.net.python.mailman vs mailman.org.python.mailman
not
mailman.python.net.mailman vs mailman.python.org.mailman
The top-level maybe shouldn't be "mailman", but rather something like "list-archive".
Why is the prefix needed at all, especially since you qualified this as "not gatewayed to Usenet"? If all the messages are local to the server, there should be no collisions on reverse list-id newsgroup names. I've always kind of felt that Gmane's 'gmane.' prefix was a bit superfluous. I'm sure there's a good reason for it (probably buried in some Gmane FAQ).
+1 for the reversed list-ids.
Another issue with newsgroup names is that some lists *are* registered in the news hierarchy, so provision for such aliases should be allowed.
Maybe that's the reason for the prefix?
Let the bikeshed-painting begin!
blue, no yellow! aaarrrggg!
-Barry

I don't think Terri needs a cc, if she's not on mm-d.
Barry Warsaw writes:
The top-level maybe shouldn't be "mailman", but rather something like "list-archive".
Why is the prefix needed at all, especially since you qualified this as "not gatewayed to Usenet"? If all the messages are local to the server, there should be no collisions on reverse list-id newsgroup names.
It's not clear to me how the MUA would know that. Many MUAs (well, at least one, and you know which one I'm talking about! Reindeer and mooses and Gnus, oh my!) can handle multiple feeds simultaneously. Suppose you have multiple archives (mirrors!) which happen to have the same list archived? How does the MUA de-dupe lists (and/or merge them, although I don't know if any MUAs can do that) if we don't have a unique public name?
Another issue with newsgroup names is that some lists *are* registered in the news hierarchy, so provision for such aliases should be allowed.
Maybe that's the reason for the prefix?
Part of it, yes.

On Jul 06, 2012, at 02:36 PM, Stephen J. Turnbull wrote:
It's not clear to me how the MUA would know that. Many MUAs (well, at least one, and you know which one I'm talking about! Reindeer and mooses and Gnus, oh my!) can handle multiple feeds simultaneously. Suppose you have multiple archives (mirrors!) which happen to have the same list archived? How does the MUA de-dupe lists (and/or merge them, although I don't know if any MUAs can do that) if we don't have a unique public name?
Another issue with newsgroup names is that some lists *are* registered in the news hierarchy, so provision for such aliases should be allowed.
Maybe that's the reason for the prefix?
Part of it, yes.
Should the prefix be configurable then?
-Barry

Barry Warsaw writes:
Should the prefix [for the "archived news hierarchy"] be configurable then?
Yes, but once we figure out what it "should" be, admins should be strongly advised not to change the prefix if they allow mirrors.

aAt Thu, 05 Jul 2012 09:24:33 +0900, Stephen J. Turnbull wrote:
[...] Newsgroup names are an issue here. It seems to me that (if not gateway'd to Usenet) they should be something like (pseudo-code)
"mailman." + join(reverse(split(list-id,".")),".")
Eg, this list would be "mailman.org.python.mailman-developers". I know that's considered ugly-out-the-wazoo, but these need to be UUIDs (consider mirrors), and mailman@python.org should not be in the same subtree as mailman@python.net (ie,
mailman.net.python.mailman vs mailman.org.python.mailman
not
mailman.python.net.mailman vs mailman.python.org.mailman
The top-level maybe shouldn't be "mailman", but rather something like "list-archive".
Currently I am thinking about the naming scheme. I think that we should not split and reverse the list. So currently I implemented something like that (pseudo-code):
join(reverse(split(mail_host, '.')), '.') + ',' + list_name
If the list_name would be also reversed, it could lead to some surprising subtree clashing. For example web2.0 would be in the same subtree like something1.0 (people sometimes use strange list names...). Even with the current implementation the group names are ugly. Maybe we should eliminate the dots from the list names by default and only allow separate groups with the alias mechanism?
Alex

Alexander Sulfrian writes:
If the list_name would be also reversed, it could lead to some surprising subtree clashing. For example web2.0 would be in the same subtree like something1.0 (people sometimes use strange list names...).
I agree that list_name should *not* be reversed; it is an atom.
This "atomicity" is a problem. We have three different namespaces and syntaxes to deal with here: RFC 5322 email addresses, RFC 2919 List-Ids, and RFC 5536. In RFC 5322, there's a special class, the "dotted-atom", which may be used in the mailbox component of an address (and thus denotes an atomic resource). But not in RFC 5536, where dots aren't allowed in newsgroup name components. I think this is a problem for post-GSoC, though.
Even with the current implementation the group names are ugly.
I would expect that MUA presentations will deal with this. For example, exploiting the hierarchy, the dots could appear as breadcrumbs:
mailman > org > python > mailman-developers
MAILMAN-DEVELOPERS
[summary lines]
[current message header info such as author, subject, date]
[current message body]
> Maybe we should eliminate the dots from the list names by default
and only allow separate groups with the alias mechanism?
Quite possibly, but don't worry about it for the purposes of GSoC I think. The worst that would happen is that a few, relatively unusual lists would be inaccessible. But I think dealing with this requires some thought, so let's not get committed to a hasty design. Document that dotted names may show strange behavior (including being inaccessible), and move on for now.

At Thu, 12 Jul 2012 17:44:37 +0900, Stephen J. Turnbull wrote:
Alexander Sulfrian writes:
If the list_name would be also reversed, it could lead to some surprising subtree clashing. For example web2.0 would be in the same subtree like something1.0 (people sometimes use strange list names...).
I agree that list_name should *not* be reversed; it is an atom.
This "atomicity" is a problem. We have three different namespaces and syntaxes to deal with here: RFC 5322 email addresses, RFC 2919 List-Ids, and RFC 5536. In RFC 5322, there's a special class, the "dotted-atom", which may be used in the mailbox component of an address (and thus denotes an atomic resource). But not in RFC 5536, where dots aren't allowed in newsgroup name components. I think this is a problem for post-GSoC, though.
Even with the current implementation the group names are ugly.
I would expect that MUA presentations will deal with this. For example, exploiting the hierarchy, the dots could appear as breadcrumbs:
mailman > org > python > mailman-developers MAILMAN-DEVELOPERS [summary lines] [current message header info such as author, subject, date] [current message body]
Yeah, it is currently working this way. The ugly names, I refered to, occur for example with "web2.0@example.com":
com > example > web2 > 0
Thunderbird is even more ugly. It shorten the name in the overview to display only the first letter of each subtree:
c.e.w.0
But as you said, I will leave it for now in this state and keep in mind, that we should find a better solution in future.
Maybe we should eliminate the dots from the list names by default and only allow separate groups with the alias mechanism?
Quite possibly, but don't worry about it for the purposes of GSoC I think. The worst that would happen is that a few, relatively unusual lists would be inaccessible. But I think dealing with this requires some thought, so let's not get committed to a hasty design. Document that dotted names may show strange behavior (including being inaccessible), and move on for now.
Despite of having a unusual name all lists should be accessible. The current implementation should not lead to inaccessible groups. So I think it is acceptable for now.
participants (4)
-
Alexander Sulfrian
-
Barry Warsaw
-
Ian Eiloart
-
Stephen J. Turnbull