Re: [Mailman-Developers] any interest in a new built-in web-archive? (i.e. pipermail replacement)
On Mon, Mar 26, 2012 at 03:20:05PM -0700, David Jeske wrote:
I'm writing to find out the state of and philosophy surrounding pipermail in mailman, to see if there is a productive way to provide some code/development-time to that part of mailman.
I know there are several decent third-party archivers out there, but many of the mailing list archives I browse regularly are using pipermail because it's the mailman default.... one less thing to install, administer, and upgrade. Unfortunately, pipermail doesn't do a good job formatting messages for html.. (messages with no line-breaks are the most annoying problem I regularly run into)
I've written code for this a number of times (eGroups, Yahoo Groups, Google Groups). I also released an open-source python/clearsilver/sqlite based archiver with redundant text-eliding, a few different thread views, and search... ( http://www.clearsilver.net/archive/ ) which is hardly used both because I don't try to popularize it, and because many sites just leave the default (pipermail).
If there is some code (and time) I can contribute to mailman/pipermail, I'd like to do so. I'm writing this message to "take a temperature" and find out what, if any, contributions would be appreciated the mailman development team. I could imagine answers like:
a) pipermail is fine... if you want to fix a bug or two submit a patch, but we don't want to improve it b) we're ditching pipermail entirely... in the future sites will have to choose an install an external archiver c) we'd love pipermail to be improved... but we still want it to be simple, static-html, and dependency free d) we'd love a dynamic-ui replacement for pipermail... as long as it uses the same cgi/templating model as mailman ui
Thoughts? Is there any help I can offer up here?
We'd love to have work done on the archier! I know that we're ditching pipermail entirely and that archivers are becoming separate from the core mailman. What I don't know is whether mailman3 will eventually have a standard archiver which lives outside of the core mailman but is recommended for installation along with it.
At PyCon a few weeks ago, I demoed hyperkitty which showed some of the things that a next generation archiver could do. hyperkitty is continuing to be developed. As I was talking about hyperkitty we touched briefly on what I think is one of the central conundrums about having only unofficial third party archivers: how to have a consistent programatic interface available over http. Grackle is another archiver for mailman that doesn't have the UI bells and whistles of hyperkitty but it does make an effort to expose a REST UI to the world. I think that's a beautiful thing. But I don't like that a site that wanted both would need to run two archivers that were saving mail into two sets of storage.
I think there's several ways we could go about this.
- We could create a standard REST API that archivers were free but encouraged to implement.
- We could expand the python API that archivers needed to expose and then implement the REST API inside of mailman Core (or a re-envisioned, lightweight Grackle).
- We could promote a standard archiver much as we're going to promote posterius as the standard admin front-end and that archiver would provide a standard REST API that others could then emulate.
hyperkitty: Project page: https://fedorahosted.org/hyperkitty/ Code: http://bzr.fedorahosted.org/bzr/hyperkitty/ Demo: http://mm3test.fedoraproject.org/2
grackle: https://launchpad.net/grackle
(One thing I notice about grackle now is that it's AGPL... that's going to be unpleasant for some sites to run. Perhaps we can change that or get some changes added to the AGPL.)
-Toshio
On Mon, Mar 26, 2012 at 5:11 PM, Toshio Kuratomi <a.badger@gmail.com> wrote:
We'd love to have work done on the archier! I know that we're ditching pipermail entirely and that archivers are becoming separate from the core mailman. What I don't know is whether mailman3 will eventually have a standard archiver which lives outside of the core mailman but is recommended for installation along with it.
I see.. that sounds like option-b.
I highly recommend reconsidering this and including a standard archiver with mailman. If the number of sites that use pipermail is any indication, I think failing to include something will basically mean lots of lists without any archives.
At PyCon a few weeks ago, I demoed hyperkitty which showed some of the things that a next generation archiver could do.
I recommend you take a closer look at ClearsilverListArchive<http://www.clearsilver.net/archive/>, it's written in Python, Clearsilver, SQLite.. is "real open-source" (BSD License), and hits most of the features on your ModernArchiving wishlist plus a bunch you didn't (author pages, redundant text elimination, cookie preferences.
As for the features it doesn't have from your list: Editing would be easy to add because it's sqlite (deciding on the auth system is probably more of an issue than the editing). Anti-Crawl code is really an issue of configuration for cheap in-memory state-management. NNTP is well. that would be a big job that I doubt will be bitten off by something as "small" as a list archiver.
Sadly I can't point to any lists using it at the moment, because, well, it's hidden under a rock. I'll injest an archive of the mailman list so you can take it for a spin.
As I was talking about hyperkitty we touched briefly on what I think is one of the central conundrums about having only unofficial third party archivers: how to have a consistent programatic interface available over http.
What is the REST UI used by? CSLA supports RSS. When it comes to a more involved REST UI, what software would be hitting it? I don't think I'll understand your other API/REST points until I see an answer to this.
Grackle is another archiver for mailman that doesn't have the UI bells and whistles of hyperkitty but it does make an effort to expose a REST UI to the world. I think that's a beautiful thing. But I don't like that a site that wanted both would need to run two archivers that were saving mail into two sets of storage.
I think here you are entering into a catch-22. If you have a single storage system, then you have a single storage schema, which means you have a single set of things you can do fast and most other things become impractical (because they would require synchronizing state).
I'm quite sure, for example, that the Grackle schema is not the same as the CSLA schema, and that many CSLA features would be impossible with the Grackle REST API. (short of just using it to suck everything down, but then you're just duplicating).
Why is message duplication an issue?
On Tue, Mar 27, 2012 at 9:11 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
On Mon, Mar 26, 2012 at 03:20:05PM -0700, David Jeske wrote:
As I was talking about hyperkitty we touched briefly on what I think is one of the central conundrums about having only unofficial third party archivers: how to have a consistent programatic interface available over http.
I don't understand why this is an issue. Third party archives (no "r"!) are going to want to have their own copies of the posts, and will be heavily duplicated in any case (if only because they store stuff on RAID and backup regularly :-). Communicating with them via SMTP will be fine, and LMTP provides a reasonable way to handle local archives efficiently -- you can even just use a virtual user and a standard MDA for the purpose.
If a third party wants to provide archiving service, but allow the list to supply authentication and user configuration information, that's another kettle of fish, but that's not a question of the archiver itself, that's going to be part of the Mailman 3 REST API per se.
I don't like that a site that wanted both would need to run two archivers that were saving mail into two sets of storage.
Agreed, but the storage issue is easy to solve for the Mailman site itself. Just provide a simple Handler to stuff posts into a maildir format mail folder (as noted above, this can communicate with a standard MDA via LMTP, and "require" that all third party archive UI software support that format. Maybe if space is that big an issue, provide a maildir-in-zipfile backend. If they want to do something else (eg, access an IMAP store or stuff it into a PostgreSQL database) they can provide their own handler for that ... surely this is trivial?
(One thing I notice about grackle now is that it's AGPL... that's going to be unpleasant for some sites to run. Perhaps we can change that or get some changes added to the AGPL.)
Yes, let's stay away from copyleft, period, to get a standard archiver that commercial sites and developers will be comfortable with extending. I know people are disgusted with Plesk and especially cPanel, but (a) GPL hasn't stopped those folks keeping their sources away from general public, and even with AGPL, *we* would have to keep track of *their* releases to get our hands on their changes in any timely fashion -- which half the time we don't want anyway! -- and (b) what we're doing is all about UI in some sense. Even the pipeline architecture of core Mailman is about allowing users (the script-able but not always program-able list and site admins) to easily make changes to their lists' configurations. UI design is necessarily visible, and it's unlikely to be that much of a challenge to reproduce their changes. The hard part will be getting the internal design past Barry, anyway (which is one reason why some of the more frequently-requested features provided by cPanel Mailman, like duplicate list names across virtual servers, haven't been added in the past).
On Mar 26, 2012, at 05:11 PM, Toshio Kuratomi wrote:
We'd love to have work done on the archier! I know that we're ditching pipermail entirely and that archivers are becoming separate from the core mailman. What I don't know is whether mailman3 will eventually have a standard archiver which lives outside of the core mailman but is recommended for installation along with it.
Yeah, who knows? :)
At PyCon a few weeks ago, I demoed hyperkitty which showed some of the things that a next generation archiver could do. hyperkitty is continuing to be developed. As I was talking about hyperkitty we touched briefly on what I think is one of the central conundrums about having only unofficial third party archivers: how to have a consistent programatic interface available over http. Grackle is another archiver for mailman that doesn't have the UI bells and whistles of hyperkitty but it does make an effort to expose a REST UI to the world. I think that's a beautiful thing. But I don't like that a site that wanted both would need to run two archivers that were saving mail into two sets of storage.
Really excellent points Toshio.
I think there's several ways we could go about this.
- We could create a standard REST API that archivers were free but encouraged to implement.
- We could expand the python API that archivers needed to expose and then implement the REST API inside of mailman Core (or a re-envisioned, lightweight Grackle).
- We could promote a standard archiver much as we're going to promote posterius as the standard admin front-end and that archiver would provide a standard REST API that others could then emulate.
And very good suggestions too. I'm not sure what the best thing to do right now, but I've long thought that the core needs a basic "message store" (as you'll see in the IMessageStore interface, which probably sucks ;). It's possible that the prototype archiver morphs into this thing and that as we expose the IMessageStore to the core's REST API, we'll start to define what we need from an archiver. I agree that having such an API in front of the archiver is a truly beautiful thing.
(One thing I notice about grackle now is that it's AGPL... that's going to be unpleasant for some sites to run. Perhaps we can change that or get some changes added to the AGPL.)
It may be difficult to change. Canonical's default license on FLOSS it releases is AGPL. If we can make a good case for wanting something different, it may be possible to change. On a separate track, at Pycon you made a persuasive argument about the AGPL's flaws, and since that's an official FSF license, it would be good if "someone" would explore addressing those problems with them. That probably won't be me <wink>.
-Barry
On Mon, 2012-03-26 at 17:11 -0700, Toshio Kuratomi wrote:
Grackle is another archiver for mailman that doesn't have the UI bells and whistles of hyperkitty but it does make an effort to expose a REST UI to the world. I think that's a beautiful thing.
I started a small thing on hyperkitty there: http://mm3test.fedoraproject.org/api/
Pierre
participants (5)
-
Barry Warsaw -
David Jeske -
Pierre-Yves Chibon -
Stephen J. Turnbull -
Toshio Kuratomi