Chuq Von Rospach wrote:
At 11:23 PM -0500 11/21/00, Bill Bumgarner wrote:
- archival of messages is a lot more than just writing the bodies to a web server and then generating some kind of automatic TOC/index.
agreed completely. I'd take it a step further and say it probably shouldn't generate indexes at all, but that indexes should be generated when a user wants to access the archives, dynamically. That's probably the single major weakness of mhonarc.
I'd take it a step beyond THAT and say that this is really almost a per-list issue. I have mailing lists that:
- are professional in nature and I wouldn't mind even PAYING for a realtime
solution that was very user friendly
- low bandwidth lists that on demand indexing would not be an issue
- high bandwidth free lists that once-a-night indexing would be the ideal.
[bunch of useful stuff that I agree with deleted]
- for the archival of plain text messages, WebDAV is overkill [as Chuq mentions]. However, as soon as you move to archiving mime attachments, it quickly becomes extremely advantageous to archive various properties with the archived message pieces.
but you can do that with a lot less overhead in MySQL by doing a focussed database. In fact, you could program a system to do this via DBI that'd work in any DBI-capable environment, so users could roll their own based on what they've already adopted. unless WebDAV gives us enough extra capabilities to be worthy of the specialization, my argument is (and will be) we program to a more general API (like DBI), so that we work in many environments, and if someone wants, they can program a DBI->WebDAV interface to attach to it. This way, we get DB, MySQL, PostGres, Oracle, ODBC, yadayada more or less for free, giving us functionality across multiple environments that users can tailor. If we program just to WebDAV, we don't get that.
This is where i disagree *very* strongly-- maybe not with the implementation choice [DBI], but with the reasoning behind it.
I don't think archival should be treated as a database centric operation. The concept of archival falls very naturally into a static hierarchy of collections/directories containing resoruces/files with a bit of additional meta information associated with some resources. This is exactly the kind of information archive that a web server is *designed* to optimally serve. Adding extra layers here or abstracting to a DBI really doesn't buy us much.
Alone, a basic filesystem served webserver gives us:
- effecient access to archives
- basic per-site, per-list authentication
- [with little addition] unified access/passwords between lists, etc...
- almost *zero* overhead with *very little* implementation cost
WebDAV adds the ability to do advanced locking, easy meta information storage, etc... but-- most importantly-- does not take away the very effecient presentation of data naturally present within a filesystem of stuff served by a web server.
As well, a filesystem centric storage/presentation solution-- webdav or raw filesystem-- solves *most* peoples archiving problems *most* of the time.
I feel *very strongly* that the archival solution-- whether it be raw messages or decoded messages-- should be centric to storing files in directories and serving files from directories.
The second reason I feel strongly that moving to a DBI based interface wouldn't present that much of an advantage is that most people that need to actually store the data in a database are going to have their own requirements surrounding decoding, storage, indexing, and presentation of said database related content. There are few *real* standards in terms of the storage of multimedia [MIME] content into a database environment and, as such, the developer will likely have to rip the data out of whatever our implementation prefers and into their own storage subsystem.
In my experience [storing email into a database was actually a problem we had to solve-- this is the implementation we successfully/effectively used], it is far more convenient to provide an HTTP [replaced with a MODULAR in a real implementation] gateway that delivers the processed, but still relatively raw, messages to some other subsystem for subsequent parsing and storage. In our case, we used HTTP to deliver inbound messages to a WebObjects application that parsed the message into EOs [enterprise objects] and persisted those via the various APIs included with WebObjects.
Another way of looking at this is that as soon as most developers are going to want to work with the data in the context of a true database, most developers are also going to want to actually use their tool-du-jour [WebObjects, ASP, EJB, PHP, Zope, etc...] to process that information. Taking a two pronged aproach to archival-- we provide a simple [and modular] filesystem-esque approach to store the data in a more traditional manner-- be it directly to the filesystem or via a WebDAV adaptor (since WebDAV is very filesystem like, just w/HTTP as the protocol of choice) and an equally as simple modular gateway that allows the developer/administrator to easily configure the system such that the data is delivered to their server of choice via the protocol of chioice-- will likely reduce the complexity of our implementation and increase the attractiveness in that our codebase is that much simpler and more approachable.
This is *not* to say that the DBI approach isn't the right way to go; if a generic DBI->filesystem, DBI->WebDAV, DBI->DB capable API were put together and was relatively hidden from the user and casual developer, it might be a huge win.
So it's choosing what the appropriate interfaces are that's as important as having interfaces. you don't program to a technology unless you have to -- you program to an interface that enables technologies. (image: this is chuqui. this is a dead horse. This is chuqui holding a whip...)
And bbum following with a club.... :-) Agreed.
- ....restoring decoded attachments and reencoding back to their original state with their original headers is an extremely cool feature.
Truly. And if we can support BLOBs in DBI, well, we don't have to write anything to disk and can generate an entire message out of a DBI database -- portable to any decent database.
But an order of magnitude less effecient than downloading the BLOB off of disk via a webserver!
Generic access with simple access control is what *most* users/administrators want *most* of the time. More complex/abstracted/portable access is less of a requirement and *a lot* of the people with such requirements also have other issues-- real or imagined-- that dictate that they really just want Mailman to hand the stuff to them as quickly/easily as possible and be done with it.
- if we are to manage the complexity associated with the integration of numerous technologies, it is only going to happen through well refined and highly modular APIs....
agreed. and to make ti clear, I'm not arguing against WebDAV. I'm arguing that for something like this, you define the interface and see if you can build it in a way that you don't JUST get WebDAV, but support at a more abstracted level that gets you a range of supported technologies (and future capability for that yet discovered) for an incrementally greater amount of work. the trick is to find the right abstractions and the proper technology layer to attach that to.
Totally-- and I hope no one thought I was advocating WebDAV as the end all, be all, only solution!
I feel strongly that abstraction is key, but that we should also provide decent, production quality, implementations of solutions to the very same set of problems for which we build the gneric abstracted/modularized APIs.
If Mailman is not fully functional "out of the box", then people will ignore it. However, if it isn't also flexible enough to be integrated into their weird environments (because every server on the web has weirdness), they'll bitch and moan until they find something else that doesn't solve their problem to B&M about....
b.bum