Mailman 3 Login / User Identification Issues in MM3 - Mailman-Developers

Login / User Identification Issues in MM3

older
Exposing not advertised lists via...

Richard Wackerbarth

10 Jul 2012 10 Jul '12

9:44 p.m.

I am encountering a problem with login on Postorius. There is no mechanism to keep the MM user database synchronized to the one which django creates. If you use BrowserID to log in, or if you are otherwise in the django user database, this is currently causing a access error for any user that was not previously defined in the MM database when attempting to access the MM interface.

Further, in implementing extensions to the login function, having the password (but not the other user profile information, including alternate credential specifications) on a separate system greatly complicates the handling of login.

It would greatly simplify things if all of the user information, including profile, and whatever information the "social media" folks want to keep, were stored in one place.

If we are not going to have a single database, to which all components of the system have equal access, then I suggest that we treat the user information as a separate realm from both the UI and the "core" and require both of them to access that information through the same REST interface.

Show replies by date

Stephen J. Turnbull

11 Jul 11 Jul

2:14 a.m.

Richard Wackerbarth writes:

...

I am encountering a problem with login on Postorius. There is no mechanism to keep the MM user database synchronized to the one which django creates.

Since Postorius is the facility by which admins will expect to access the user database, I have to consider that a bug in Postorius.

Richard Wackerbarth

3:51 a.m.

Yes, it is a bug in Postorius. However, that does not negate the fact that the present design, by forcing a split database, makes it difficult to accomplish the desired behavior.

As far as I am concerned, if Postorius is corrected to handle this situation robustly, it will be in spite of, and not with the help of, the design being forced upon it.

Since by far the most complicated, and most used, logins will be made from the web interface, it is much more logical to allow Postorius to be responsible for all of the user profile information and have the core, on those rare occasions where it requires it, obtain password confirmation from that source.

Richard

On Jul 10, 2012, at 9:14 PM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
I am encountering a problem with login on Postorius. There is no mechanism to keep the MM user database synchronized to the one which django creates.

Since Postorius is the facility by which admins will expect to access the user database, I have to consider that a bug in Postorius.

Stephen J. Turnbull

5:12 a.m.

Richard Wackerbarth writes:

...

Since by far the most complicated, and most used, logins will be made from the web interface, it is much more logical to allow Postorius to be responsible for all of the user profile information and have the core, on those rare occasions where it requires it, obtain password confirmation from that source.

...

From the point of view of Postorius implementers, no question about it.<wink/>

But isn't that going to take us a long way down the road where we anoint Postorius the one-and-only admin interface? If that really needs to be, OK, but I don't much like it. Among other things, it will make the design and detailed UI of Postorius a focus of discussion for everybody concerned with Mailman 3. And it makes the option to "build one to throw away" much more difficult -- the design decisions already made, and will be made in the near future, will probably live as long as Pipermail has (and Pipermail will continue for several more years, at least!)

If that doesn't scare you....<wink/>

Terri Oda

6:55 a.m.

On 12-07-10 11:12 PM, Stephen J. Turnbull wrote:

...

But isn't that going to take us a long way down the road where we anoint Postorius the one-and-only admin interface? If that really needs to be, OK, but I don't much like it. Among other things, it will make the design and detailed UI of Postorius a focus of discussion for everybody concerned with Mailman 3. And it makes the option to "build one to throw away" much more difficult -- the design decisions already made, and will be made in the near future, will probably live as long as Pipermail has (and Pipermail will continue for several more years, at least!)

I think it may be possible that the core authentication stuff can be pushed into REST without tying us to postorius forever, but I haven't got it quite set in my head how that will go yet.

Right now, Postorious can do logins based on email/password pairs in REST.
We'd like to do BrowserId, which only needs the email (and we're trusting the browser to do the authentication) so that shouldn't be a problem. BrowserID was not completely implemented when I last played in there... unless someone else has finished the hookup, please do not assume that it's fully working and feel free to file bugs so what's not working is clearly indicated somewhere other than my head. ;) Right now, it generates a login, but has no useful interaction with REST settings.
We'd also like to do openid, which means we need to somehow associate an openid token with an email address.

So right now, postorius needs email address, username (for direct authentication), and potentially a list of openid or other tokens.
That's a small enough list that we may be able to justify making mailman core aware of a small token list (or a single openid token?), or we can let postorius handle that and have core only understand "I am the owner of this email address -- let me see the associated settings of me." I think my preference would be to have mailman understand more than email/password authentication, because I think it'll make things easier and not have us duplicating data in hyperkitty etc, though.

The messy part, IMO, is what to do with the non-authentication user data. I'm guessing we'll probably want some sort of theme preference data (possibly shared between postorius/hyperkitty/others?). Not sure what else. That stuff... really doesn't have much place in core, but probably will need to be shared between several web components... do we have a second rest server for user data?

Richard Wackerbarth

2:27 p.m.

On Jul 11, 2012, at 1:55 AM, Terri Oda wrote:

...

We'd also like to do openid, which means we need to somehow associate an openid token with an email address.

So right now, postorius needs email address, username (for direct authentication), and potentially a list of openid or other tokens. That's a small enough list that we may be able to justify making mailman core aware of a small token list (or a single openid token?), or we can let postorius handle that and have core only understand "I am the owner of this email address -- let me see the associated settings of me." I think my preference would be to have mailman understand more than email/password authentication, because I think it'll make things easier and not have us duplicating data in hyperkitty etc, though.

...

The messy part, IMO, is what to do with the non-authentication user data. I'm guessing we'll probably want some sort of theme preference data (possibly shared between postorius/hyperkitty/others?). Not sure what else. That stuff... really doesn't have much place in core, but probably will need to be shared between several web components... do we have a second rest server for user data?

If we are going to use REST interfaces to tie components together, then, in the design, we should do a few things differently.

First, we should define all of the data storage in terms of the REST access points to the data. (And not, as presently done, the other way around) Next, we should access all of the REST interface URLs indirectly so that functionality can be moved around simply by changing a single reference definition. This is the kind of scheme that django uses with its {% url %} tag. (like DNS vs IP addresses) Finally, we should "black box encapsulate" the access to the data, requiring that EVERY module utilize this common interface, and only this interface, to the data.

Barry Warsaw

12 Jul 12 Jul

2:53 a.m.

On Jul 11, 2012, at 09:27 AM, Richard Wackerbarth wrote:

...

First, we should define all of the data storage in terms of the REST access points to the data. (And not, as presently done, the other way around) Next, we should access all of the REST interface URLs indirectly so that functionality can be moved around simply by changing a single reference definition. This is the kind of scheme that django uses with its {% url %} tag. (like DNS vs IP addresses) Finally, we should "black box encapsulate" the access to the data, requiring that EVERY module utilize this common interface, and only this interface, to the data.

If there were a separate user database that exposed all sorts of data via REST, including stuff the core doesn't care about, then all the things I described before about re-implementing various Zope interfaces in the core holds true. They would just do REST calls instead of database queries.

Both implementations in fact can live side-by-side, and choosing one or the other is something you do when you install and configure the system.

-Barry

Richard Wackerbarth

6:42 p.m.

On Jul 11, 2012, at 9:53 PM, Barry Warsaw wrote:

...

On Jul 11, 2012, at 09:27 AM, Richard Wackerbarth wrote:

...
First, we should define all of the data storage in terms of the REST access points to the data. (And not, as presently done, the other way around) Next, we should access all of the REST interface URLs indirectly so that functionality can be moved around simply by changing a single reference definition. This is the kind of scheme that django uses with its {% url %} tag. (like DNS vs IP addresses) Finally, we should "black box encapsulate" the access to the data, requiring that EVERY module utilize this common interface, and only this interface, to the data.

If there were a separate user database that exposed all sorts of data via REST, including stuff the core doesn't care about, then all the things I described before about re-implementing various Zope interfaces in the core holds true. They would just do REST calls instead of database queries.

Agreed.

...

Both implementations in fact can live side-by-side, and choosing one or the other is something you do when you install and configure the system.

I'm not sure that I agree with attempting to maintain two implementations "side-by-side"

"separate, but equal" comes to mind. And we all know how that worked.

My preference remains to use the REST interface to define the object contract. By that, I mean that every action performed on the object is isomorphic to a REST interaction.

To do otherwise creates an inequality between those accessing the object.

...

-Barry

Barry Warsaw

9:41 p.m.

On Jul 12, 2012, at 01:42 PM, Richard Wackerbarth wrote:

...

On Jul 11, 2012, at 9:53 PM, Barry Warsaw wrote:

...
Both implementations in fact can live side-by-side, and choosing one or the other is something you do when you install and configure the system.

I'm not sure that I agree with attempting to maintain two implementations "side-by-side"

"separate, but equal" comes to mind. And we all know how that worked.

The question is whether we want to support running the core without having this user service available, i.e. in standalone mode. Are we going to require that this service be running in order to run the core? I think we shouldn't be so strict.

I'm not too concerned about multiple implementations getting out of sync. I think we'll be able to devise sufficient tests to ensure that, as far as the core is concerned, both implementations are providing all the necessary information.

-Barry

Stephen J. Turnbull

13 Jul 13 Jul

3:03 a.m.

Barry Warsaw writes:

...

The question is whether we want to support running the core without having this user service available, i.e. in standalone mode. Are we going to require that this service be running in order to run the core? I think we shouldn't be so strict.

But *some* user service needs to be available. This is to my mind a core component of Mailman in any case. I don't see why having a single interface (with perhaps a REST "binding" for IPC and a direct zope.interfaces binding for within-process use) and a default implementation of that service as a core component (but separate from the post routing and delivery and admin routing and handling services) would be very costly in implementation and maintenance, but it ensures the flexibility to deal with unknown future requirements everybody says is important.

Richard Wackerbarth

4:56 p.m.

Stephen,

From this posting to which I am replying and other recent ones from Barry, etc. I conclude that it is highly likely that the three of us have somewhat similar conceptual models of an extensible MM system. However, because of the terms that we have chosen in our descriptions, we have failed to communicate. As a result, we have misinterpreted the others' descriptions.

For example, in earlier postings, I used the term "component". Rather than the characteristics that Stephen seemed to apply to the term, my intended concept is one which, in the text quoted, I would now call a "service" -- a "user service" for information related to the persons participating in the lists, a "post routing and delivery service" that interfaces with the MTA, an "admin handling service" for changing the state of the stored choices, etc. (My exact grouping of functionality may not be correct. But that is a separate issue.)

Now, when it comes to interfaces, in that context, I have been referring to "service interfaces". Barry has acknowledged that the (IUser, et al.) zope interfaces are, collectively, an implementation-level interface which, presumedly, provides capability sufficient to implement the portion of the service-level interfaces that are required by the "core" modules provided. They may also provide some additional functionality that someone anticipated to be of use by other services which might wish to interact with these interfaced objects.

Similarly, I now believe that Stephen's use of "core database" was more a reference to the union of all of these service-level interfaces rather than that of the interpretation which I had placed on the term, namely a "persistent store running in conjunction with the post routing service"

Now, hopefully, we can agree that it is necessary to have a "user service". Further it is reasonable to attempt to have this service distinct from the "post routing service". The question is then "How will these services interface to each other?"

Stephen says:

...

... having a single interface (with perhaps a REST "binding" for IPC and a direct zope.interfaces binding for within-process use)

I agree with the "single interface" aspect of it, particularly if that interface is viewed at the functional level rather than at an implementation level. However, I feel that there should be a complete REST binding for that functional interface. This interface can then be implemented by a zope.interface, and/or in any other format appropriate for the implementing code. However, the implementation interface must not provide any inter-service capability that is not reflected in the "interface". Intra-service capabilities are quite useful, particularly when they provide alternate functional signatures which accept local proxies for interacting objects.

Further, as an inter-service design constraint, it should be assumed that the implementation of any service might be "remote" and accessible only by way of the REST interface to it. In addition it should be assumed that the implementation of that service will be in an arbitrary programming language other than python. Designs which violate either of these assumptions should not be accepted because they reduce the ductility of the interface.

The implementation interfaces provide only a proxy for the conceptual objects defined in the service interface and any action on those proxies needs to be conveyed to the conceptual object by means of an access specified in the system interface.

Richard

On Jul 12, 2012, at 10:03 PM, Stephen J. Turnbull wrote:

...

Barry Warsaw writes:

...
The question is whether we want to support running the core without having this user service available, i.e. in standalone mode. Are we going to require that this service be running in order to run the core? I think we shouldn't be so strict.

But *some* user service needs to be available. This is to my mind a core component of Mailman in any case. I don't see why having a single interface (with perhaps a REST "binding" for IPC and a direct zope.interfaces binding for within-process use) and a default implementation of that service as a core component (but separate from the post routing and delivery and admin routing and handling services) would be very costly in implementation and maintenance, but it ensures the flexibility to deal with unknown future requirements everybody says is important.

Barry Warsaw

7:02 p.m.

On Jul 13, 2012, at 11:56 AM, Richard Wackerbarth wrote:

...

Now, hopefully, we can agree that it is necessary to have a "user service". Further it is reasonable to attempt to have this service distinct from the "post routing service". The question is then "How will these services interface to each other?"

I think we're definitely converging on a common language for all the moving parts.

...

I agree with the "single interface" aspect of it, particularly if that interface is viewed at the functional level rather than at an implementation level. However, I feel that there should be a complete REST binding for that functional interface.

WADL is, afaik, the definition language for RESTful web services:

http://en.wikipedia.org/wiki/Web_Application_Description_Language

It is an XML-based format.

Some REST implementation libraries provide a way to generate the WADL from the code, but frankly the ones I've used have enough downsides to far outweigh automatic WADL generation. I'm not aware of anything that takes WADL as the input and creates the REST implementation for you, so all things considered, I think WADL is fairly useless to us.

I would be happy enough with human readable <wink> documentation that described this user service in terms of resource locations and contents. Although it's been years since I read it, Leonard Richardson's definitive O'Reilly book RESTful Web Services talks about how this description can even (or maybe *should*) be HTML, with links to the actual resources being described.

Or thought about it a different way, mm3's REST API only supports JSON representation for resources. You could implement a parallel representation using HTML that included these descriptions. I bet you could even do this in Sphinx in a very human-friendly format, with "base-url" templating so that the resource locations are resolvable wherever the user service is deployed.

Note that this service can have varying levels of compatibility. E.g. we could say that level 1 is required, and define this as the minimal service required by the core. Things like Facebook id would not be in level 1. Higher levels would provide additional information that could be used to enhance the user experience. I have no idea whether more than two levels is necessary, but I do think there needs to be some discussion of extensibility.

...

This interface can then be implemented by a zope.interface, and/or in any other format appropriate for the implementing code. However, the implementation interface must not provide any inter-service capability that is not reflected in the "interface". Intra-service capabilities are quite useful, particularly when they provide alternate functional signatures which accept local proxies for interacting objects.

Further, as an inter-service design constraint, it should be assumed that the implementation of any service might be "remote" and accessible only by way of the REST interface to it. In addition it should be assumed that the implementation of that service will be in an arbitrary programming language other than python. Designs which violate either of these assumptions should not be accepted because they reduce the ductility of the interface.

The implementation interfaces provide only a proxy for the conceptual objects defined in the service interface and any action on those proxies needs to be conveyed to the conceptual object by means of an access specified in the system interface.

I think the use of the term "proxy" is informative here.

Let's say the core only cares about level 1 user service compliance. Maybe that's implemented as an external (to the core) service. In this case, the implementation of interfaces like IUser and IUserManager would simply be proxies to this service. The service itself doesn't care about these zope.interfaces, it only cares about providing a level 1 compatible user service via REST (probably with a JSON representation).

Cheers, -Barry

Richard Wackerbarth

10:43 p.m.

On Jul 13, 2012, at 2:02 PM, Barry Warsaw wrote:

...

On Jul 13, 2012, at 11:56 AM, Richard Wackerbarth wrote:

...
Now, hopefully, we can agree that it is necessary to have a "user service". Further it is reasonable to attempt to have this service distinct from the "post routing service". The question is then "How will these services interface to each other?"

I think we're definitely converging on a common language for all the moving parts.

That, in itself, is significant.

...

...
I agree with the "single interface" aspect of it, particularly if that interface is viewed at the functional level rather than at an implementation level. However, I feel that there should be a complete REST binding for that functional interface.

WADL is, afaik, the definition language for RESTful web services:

From your description, I, also, question what we would gain by attempting to use WADL as our formalization language.

...

I would be happy enough with human readable <wink> documentation that described this user service in terms of resource locations and contents.

I wouldn't even go that far. The service does not need to be DESCRIBED in terms of locations at all.

Certainly, somewhere in the implementation, we would have to provide that CONFIGURATION information, but from the perspective of a consumer of the "user service", my interface is akin to (expressed as meta-language, not code)

services.user_service.change_password(user_identifier, new_password, ... )

It does not matter where, or how, the user_service processes that information except to the extent that I, and the another services, expect it to persist and can later enquire

password_is_valid = services.user_service.verify_password(user_identifier, submitted_password, ... )

...

Although it's been years since I read it, Leonard Richardson's definitive O'Reilly book RESTful Web Services talks about how this description can even (or maybe *should*) be HTML, with links to the actual resources being described.

This is a useful technique for publishing documentation where the service is being exposed to "outside" consumers. For example, I may learn that there is a RESTful stock quotation service at http:// ..... I know nothing more about it. Since I will be using http to access the service, it is convenient to publish usage documentation along with data.

However, in a controlled environment, documenting the interface "python style" will, IMHO will be just as useful.

...

Or thought about it a different way, mm3's REST API only supports JSON representation for resources. You could implement a parallel representation using HTML that included these descriptions. I bet you could even do this in Sphinx in a very human-friendly format, with "base-url" templating so that the resource locations are resolvable wherever the user service is deployed.

I'm not opposed to having the documentation published in this, or any other, manner. I would only hope that doing so can be done following the DRY principle.

...

Note that this service can have varying levels of compatibility. E.g. we could say that level 1 is required, and define this as the minimal service required by the core. Things like Facebook id would not be in level 1. Higher levels would provide additional information that could be used to enhance the user experience. I have no idea whether more than two levels is necessary, but I do think there needs to be some discussion of extensibility.

I have some ideas about ways by which we might even make the interface "auto-configurable". -- like passing a query of requirements (or reading a list of capabilities) For example, through the services manager, "ask" it a webUI service is available. If it is, ask the webUI service for the URLs that go into a welcome message. (Or the alternative, always treat things as if there is a webUI. If the installer has not configured one, then a default stub would return null strings for those requests.

...

...
This interface can then be implemented by a zope.interface, and/or in any other format appropriate for the implementing code. However, the implementation interface must not provide any inter-service capability that is not reflected in the "interface". Intra-service capabilities are quite useful, particularly when they provide alternate functional signatures which accept local proxies for interacting objects.

Further, as an inter-service design constraint, it should be assumed that the implementation of any service might be "remote" and accessible only by way of the REST interface to it. In addition it should be assumed that the implementation of that service will be in an arbitrary programming language other than python. Designs which violate either of these assumptions should not be accepted because they reduce the ductility of the interface.

The implementation interfaces provide only a proxy for the conceptual objects defined in the service interface and any action on those proxies needs to be conveyed to the conceptual object by means of an access specified in the system interface.

I think the use of the term "proxy" is informative here.

...

Let's say the core only cares about level 1 user service compliance. Maybe that's implemented as an external (to the core) service. In this case, the implementation of interfaces like IUser and IUserManager would simply be proxies to this service. The service itself doesn't care about these zope.interfaces, it only cares about providing a level 1 compatible user service via REST (probably with a JSON representation).

Yes, but it goes farther than that. Even if the user service is local, in the (core), IUser is still a proxy for the user service. However, for those calls, rather than serializing the data, converting internal object identifiers to external ones, etc. the information is just copied into the corresponding internal structure that the user service builds as it translates a request for REST services.

Barry Warsaw

14 Jul 14 Jul

12:02 a.m.

On Jul 13, 2012, at 05:43 PM, Richard Wackerbarth wrote:

...

...
I would be happy enough with human readable <wink> documentation that described this user service in terms of resource locations and contents.

I wouldn't even go that far. The service does not need to be DESCRIBED in terms of locations at all.

Certainly, somewhere in the implementation, we would have to provide that CONFIGURATION information, but from the perspective of a consumer of the "user service", my interface is akin to (expressed as meta-language, not code)

services.user_service.change_password(user_identifier, new_password, ... )

It does not matter where, or how, the user_service processes that information except to the extent that I, and the another services, expect it to persist and can later enquire

password_is_valid = services.user_service.verify_password(user_identifier, submitted_password, ... )

Actually, I think it's essential that we describe the service in terms of resources and locations. The pseudo-code above is an example of a language binding, but isn't enough because the implementer of that binding would not know where to point the proxy, or how to do the HTTP calls, etc. In fact, the pseudo-code would look significantly different depending on what implementation language is used (the above looks Pythonic, but what about Ruby, or JavaScript?).

If we're using REST then the description of the service must be in terms of the API that REST exposes, e.g. URLs and JSON (for example). Then authors of the language bindings can expose them to client code in whatever way makes sense.

Take a look at the work done in Postorius for accessing the core's REST API. There's a mailman.client package which is essentially this Python language binding to the REST API. Then the Django code can call the mailman.client, or it could do straight up HTTP itself, or the JavaScript on AJAX-y dynamic pages could do their own calls, etc.

...

...
Although it's been years since I read it, Leonard Richardson's definitive O'Reilly book RESTful Web Services talks about how this description can even (or maybe *should*) be HTML, with links to the actual resources being described.

This is a useful technique for publishing documentation where the service is being exposed to "outside" consumers. For example, I may learn that there is a RESTful stock quotation service at http:// ..... I know nothing more about it. Since I will be using http to access the service, it is convenient to publish usage documentation along with data.

However, in a controlled environment, documenting the interface "python style" will, IMHO will be just as useful.

No, we really need the resource locations and definitions because consumers of the user manager will not just be Python code. And even if it were, that Python code has to be translated to dozens or scores of HTTP calls, and those cannot all be configuration options. The base-url to the service will be configurable, yes.

...

I have some ideas about ways by which we might even make the interface "auto-configurable". -- like passing a query of requirements (or reading a list of capabilities) For example, through the services manager, "ask" it a webUI service is available. If it is, ask the webUI service for the URLs that go into a welcome message. (Or the alternative, always treat things as if there is a webUI. If the installer has not configured one, then a default stub would return null strings for those requests.

I think you're right that the level of compatibility can be auto-configured. The alternative of course is just EAFP (easier to ask forgiveness, i.e. prepare to catch any errors or handle any missing data).

...

Yes, but it goes farther than that. Even if the user service is local, in the (core), IUser is still a proxy for the user service. However, for those calls, rather than serializing the data, converting internal object identifiers to external ones, etc. the information is just copied into the corresponding internal structure that the user service builds as it translates a request for REST services.

Kind of. Note that IUser doesn't describe the service, so really the use of that zope.interface, or how the calls are proxied or not is purely an implementation detail of the consumer of the service. I wouldn't even expect all consumers to use IUser, or even Python.

Cheers, -Barry

Richard Wackerbarth

12:34 a.m.

I think that you have missed a level of abstraction.

The service, per se, does not REQUIRE a REST interface in order to operate. It only requires an implementation (in any language) and mutually agreed intra-service bindings in a mutually acceptable language.

Because we have chosen a RESTful mechanism as a communications medium for services hosted on separate processors, we will need a REST interface specification. But, that REST interface is only one implementation of the service interface. The REST particulars are not really a part of the service.

Including the locations, etc. are details of a REST IMPLEMENTATION of the service interface in the same way that Python classes and methods would be used to define a python interface. Or C++ class headers, etc.

Richard

On Jul 13, 2012, at 7:02 PM, Barry Warsaw wrote:

...

On Jul 13, 2012, at 05:43 PM, Richard Wackerbarth wrote:

...
I wouldn't even go that far. The service does not need to be DESCRIBED in terms of locations at all.

Certainly, somewhere in the implementation, we would have to provide that CONFIGURATION information, but from the perspective of a consumer of the "user service", my interface is akin to (expressed as meta-language, not code)

services.user_service.change_password(user_identifier, new_password, ... )

It does not matter where, or how, the user_service processes that information except to the extent that I, and the another services, expect it to persist and can later enquire

password_is_valid = services.user_service.verify_password(user_identifier, submitted_password, ... )

Actually, I think it's essential that we describe the service in terms of resources and locations. The pseudo-code above is an example of a language binding, but isn't enough because the implementer of that binding would not know where to point the proxy, or how to do the HTTP calls, etc. In fact, the pseudo-code would look significantly different depending on what implementation language is used (the above looks Pythonic, but what about Ruby, or JavaScript?).

If we're using REST then the description of the service must be in terms of the API that REST exposes, e.g. URLs and JSON (for example). Then authors of the language bindings can expose them to client code in whatever way makes sense.

Stephen J. Turnbull

5 a.m.

Richard Wackerbarth writes:

...

I think that you have missed a level of abstraction.

Why do we need it? AFAICS, at this point we have a bunch of services that need to be specified somehow. We *need* a RESTful interface for some functions because they make the most sense if accessed remotely, and there doesn't seem to be a good reason to go beyond HTTP and specify individual transport protocols for these remote services.

Given that, and that REST is a pretty severe restriction, it seems likely to me that if we actually need to go beyond REST for performance reasons (eg), that can easily enough be done later. We may as well specify using REST now, and have everything target that protocol until we need something more powerful or efficient.

Richard Wackerbarth

12:33 p.m.

We need it because we need to maintain that separation of services even for service characteristics that do not need to be exposed to an actual REST implementation. It is this conceptual separation that keeps the "post routing service" separable from the "admin-by-mail service" and the "mailing list archive service".

Remember that, as Barry has acknowledged, the IXxx interfaces are an implementation-level interface. If we don't maintain the separation at the "service level", interactions between the services which happen to share a common implementation interface MIGHT end up bypassing the service level by using capabilities which are available in the implementation level, but not in the service level. If they do so, it will not be possible to provide an alternate implementation of one of those services without also having to provide the alternate for the other service.

At present, there is only an implementation level specification of the interface between the "user service" and the other services presently in "core". In order to create an alternate implementation of the "user service", it will be necessary to unravel these services and assure that they are not interacting via any method which is not reflected in the service level.

We should not be required to repeat this in the future. Therefore, we need to have a service level interface for ALL inter-service interactions, even if we do not require the RESTful implementation of those interactions at this time.

On Jul 14, 2012, at 12:00 AM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
I think that you have missed a level of abstraction.

Why do we need it? AFAICS, at this point we have a bunch of services that need to be specified somehow. We *need* a RESTful interface for some functions because they make the most sense if accessed remotely, and there doesn't seem to be a good reason to go beyond HTTP and specify individual transport protocols for these remote services.

Perhaps you still fail to understand what I mean by a service-level interface.

The service-level interface describes the functional interaction between objects and controllers. It deals with conceptual objects and operations. It is not language or protocol specific.

ALL transport protocols are below this level.

Consider what is involved when someone wants to change a password:

The storage of the passwords and the later verification of them is handled by the "user service". (That is a statement is a part of the service-level specification.)

There is another service, which is processing the request (hands waived about how we get to that point in the data flow, but it is in a service other than the user service). How does it accomplish its task?

For discussion, I will call this service the "admin service". After doing whatever it needs to do to decide that it is ready,

The (admin) service presents a "user identifier" and the "new password" to the "user service" in a "request to change password". This is another service level specification.

Note that there is no mention of REST interface, or python objects, etc.

It is an implementation detail as to the format in which the "user identifier" is presented. It might be a string with the user's Id, a python object of class IUser, etc. In fact, the implementation MAY provide multiple signatures for this action. The important thing is that the ADMIN service DOES NOT "change the password field in the user object".

"change the password field in the user object" might be an internal method within the user service, but it is not visible to the admin object.

...

Given that, and that REST is a pretty severe restriction, it seems likely to me that if we actually need to go beyond REST for performance reasons (eg), that can easily enough be done later. We may as well specify using REST now, and have everything target that protocol until we need something more powerful or efficient.

Conceptually, I agree. All along, I have been advocating that the service level interface be organized in a manner which would allow it to be implemented in a RESTful manner.

I suspect that we will find some inefficiencies that would result if every operation were actually carried out in that manner and, as a result, in a particular implementation, we will group some services and allow them to bypass the strict requirement of interacting with each other only through the methods exposed in the service-level interface. However, in doing so, we must acknowledge that any replacement for one service of that implementation will have to replace the entire service group.

Richard

Stephen J. Turnbull

15 Jul 15 Jul

5:35 a.m.

Richard Wackerbarth writes:

...

We should not be required to repeat this in the future. Therefore, we need to have a service level interface for ALL inter-service interactions, even if we do not require the RESTful implementation of those interactions at this time.

I guess I think of "components" as "interfaces with implementations *potentially independent* of other components", and would argue that these could (and will) be implemented as separate processes, even as remote processes. These require RESTful implementations AIUI.

I would certainly expect that authentication services, the user database, the mailflow handlers, owner/site webUI, and user webUI would be (conceptually) five separate components, although Postorius might provide both owner/site UI and user UI.

Intracomponent APIs might be defined as zope.interfaces, and even "more internal" APIs might be simply Python classes, etc. But it seems to me requiring RESTful interfaces for all intercomponent communication is the way to go.

Richard Wackerbarth

16 Jul 16 Jul

11:10 a.m.

On Jul 15, 2012, at 12:35 AM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
We should not be required to repeat this in the future. Therefore, we need to have a service level interface for ALL inter-service interactions, even if we do not require the RESTful implementation of those interactions at this time.

I guess I think of "components" as "interfaces with implementations *potentially independent* of other components", and would argue that these could (and will) be implemented as separate processes, even as remote processes. These require RESTful implementations AIUI.

I think that "require" is too strong. What is REQUIRED is AN implementation that supports separate processes. Don't mis-understand, I fully expect that the implementation which we develop will be a RESTful one. However, I could, just as well, develop one that used, for example, SNMP. A specification provides REQUIREMENTS. Since there is no fundamental requirement that the inter-service communication be RESTful, that I object to describing the service-level interface in terms of a RESTful implementation. The service level specification should be more general.

...

I would certainly expect that authentication services, the user database, the mailflow handlers, owner/site webUI, and user webUI would be (conceptually) five separate components, although Postorius might provide both owner/site UI and user UI.

Even here, I would start by describing the "site admin UI" and the "user UI" as separate services. From there, we might describe a "service group" wherein a group of services are implemented in a manner intended to run on a single host. Within this service group, we could allow intra-group communications that bypass the inter-service interface. However, if we do so, we need to be careful that such communications are only alternate implementations for efficiency and do not represent fundamental communications that could not be handled strictly through the inter-service interface.

...

Intracomponent APIs might be defined as zope.interfaces, and even "more internal" APIs might be simply Python classes, etc. But it seems to me requiring RESTful interfaces for all intercomponent communication is the way to go.

As I indicated above, I consider RESTful interfaces to be only one implementation of a conceptual interface.

Richard

Barry Warsaw

17 Jul 17 Jul

11:38 p.m.

On Jul 13, 2012, at 07:34 PM, Richard Wackerbarth wrote:

...

I think that you have missed a level of abstraction.

I guess it's "turtles all the way down" :)

...

The service, per se, does not REQUIRE a REST interface in order to operate. It only requires an implementation (in any language) and mutually agreed intra-service bindings in a mutually acceptable language.

I think my vision of the architectural organization is very similar to Stephen's, and we only need the abstractions at a particular level to describe how other components at that same level will interoperate.

Within the core engine, which is all one self contained Python application (logically, if not actually, a single address space), then I think zope.interfaces and the Zope Component Architecture are as abstract as we need to get. If a component within the core, say the bounce detector, needs information about mailing lists, then it uses the ZCA abstractions to get Python objects (which may contain concrete implementation, or simply be proxies) which provide the services promised by those interfaces. I don't see any value in getting any more abstract than that, within the core, at least not right now.

As we talk about separating out components into separate processes, possibly residing on different machines, then the ZCA isn't an appropriate abstraction. We're really talking about IPC here, and while there is no end to IPC protocols, it seems to me that HTTP/REST is a good one to choose, because it's built on fundamental, well-supported protocols, is easy to understand, document, and implement, and is discoverable.

I'll note though that there's no reason why other IPC protocols couldn't be added later. An important implementation principle with the mm3 core, based on my experiences with its violation in mm2, is that the REST implementation contains no functional logic. All it does is translate input from HTTP into Python objects on the way in, and Python objects to HTTP on the way out. This principle is replicated by the shell scripts and the email commands, which actually, are alternative parallel forms of IPC into the core. So if someone wanted to add XMLRPC, CORBA, or whatever new protocol all the kids are using next year, it can be done probably with the same ease at which REST support was added.

Also, these abstraction layers are not mutually exclusive. Internally, the core could use the ZCA abstraction to get a proxy to a user object which communicates to the user manager over LDAP and the core wouldn't blink an eye.

...

Because we have chosen a RESTful mechanism as a communications medium for services hosted on separate processors, we will need a REST interface specification. But, that REST interface is only one implementation of the service interface. The REST particulars are not really a part of the service.

I see what you're saying, and this 1000ft abstraction of the service may be useful in the sense that it's documentation for anyone else wanting to implement said service using a different IPC mechanism, but right now where the project is, this is a bit too abstract to be meaningful *to me*. I have no problem with this kind of service description being defined before the user manager service is implemented, but it still has to be made more concrete one level down into an IPC protocol definition before it can be useful.

Cheers, -Barry

Richard Wackerbarth

18 Jul 18 Jul

10:53 p.m.

On Jul 17, 2012, at 6:38 PM, Barry Warsaw wrote:

...

On Jul 13, 2012, at 07:34 PM, Richard Wackerbarth wrote:

...
But, that REST interface is only one implementation of the service interface. The REST particulars are not really a part of the service.

I see what you're saying, and this 1000ft abstraction of the service may be useful in the sense that it's documentation for anyone else wanting to implement said service using a different IPC mechanism, but right now where the project is, this is a bit too abstract to be meaningful *to me*. I have no problem with this kind of service description being defined before the user manager service is implemented, but it still has to be made more concrete one level down into an IPC protocol definition before it can be useful.

I think that you are finally beginning to understand the perspective that I am trying to provide.

Yes, to implement anything, we will need the concrete descriptions at the protocol level. For example, we will need some definition of the services implemented as a RESTful interface. Also, at least for the message handler, because you have chosen to implement it that way in Python, an object description such as ZCA for the implementation interface will be required.

However, and this is the reason that I object to the bypassing of the service level description, we also need to assure that each of these implementations respect the requirement that ALL inter-service interactions be representable strictly in terms of service-level interfaces. If you define only the implementation interface, then you have no standard upon which to judge adherence to any standard that does not REQUIRE ALL implementations to exactly duplicate all aspects of that particular implementation.

Specifically, one interface may provide multiple implementations which accomplish the same system service. (Many ways to skin the cat) Alternate implementations are required to provide just one of them.

Barry Warsaw

19 Jul 19 Jul

6:54 p.m.

On Jul 18, 2012, at 05:53 PM, Richard Wackerbarth wrote:

...

However, and this is the reason that I object to the bypassing of the service level description, we also need to assure that each of these implementations respect the requirement that ALL inter-service interactions be representable strictly in terms of service-level interfaces. If you define only the implementation interface, then you have no standard upon which to judge adherence to any standard that does not REQUIRE ALL implementations to exactly duplicate all aspects of that particular implementation.

Specifically, one interface may provide multiple implementations which accomplish the same system service. (Many ways to skin the cat) Alternate implementations are required to provide just one of them.

Things are getting a bit too abstract for me. Why don't you take a crack at this service level description for the user manager so we can have something concrete to further the discussion? You can post it here or in the wiki.

Cheers, -Barry

Stephen J. Turnbull

7:30 p.m.

Richard Wackerbarth writes:

...

I think that you are finally beginning to understand the perspective that I am trying to provide.

I don't have any trouble understanding the perspective you're trying to provide in the abstract; it's just Software Systems Design 301.

What I have trouble seeing is why you want to go to this level of abstraction for Mailman 3, and what implications is has for design and implementation of Mailman services.

...

Yes, to implement anything, we will need the concrete descriptions at the protocol level.

But AIUI, REST isn't a concrete description. Wikipedia, for example, calls it an "architectural style". Specifically, that article defines it as a service meeting 5 constraints (and there's an optional 6th).[1] If your abstract service doesn't satisfy those constraints, then it can't be implemented RESTfully, which in turn constrains implementer designs (eg, it means changing multiple modules to implement a new service).

So for this purpose, I see the HTTP verbs used in REST, such as GET, as being primitives obeying certain constraints, but not as necessarily implemented by trivial references to the concrete HTTP implementation. Similarly, the REST approach to addressing resources via URLs puts some constraints on the structure of Mailman's data. IOW, REST is a language for describing interactions among Mailman components, not a specific implementation of such interactions.

...

Specifically, one interface may provide multiple implementations which accomplish the same system service. (Many ways to skin the cat) Alternate implementations are required to provide just one of them.

I don't understand. How is one component that speaks REST supposed to communicate with another specified in terms of the ZCA? They satisfy different constraints.

Footnotes: [1] http://en.wikipedia.org/wiki/REST

Barry Warsaw

13 Jul 13 Jul

6:43 p.m.

On Jul 13, 2012, at 12:03 PM, Stephen J. Turnbull wrote:

...

Barry Warsaw writes:

...
The question is whether we want to support running the core without having this user service available, i.e. in standalone mode. Are we going to require that this service be running in order to run the core? I think we shouldn't be so strict.

But *some* user service needs to be available.

Agreed, but my question is whether the limited user manager service that the core currently provides is enough for standalone deployments where all the fancy stuff about Facebook ids and reputations doesn't matter. I have no problem with providing a more limited set of functionality in that case (i.e. "just the messages, ma'am"[*]).

Cheers, -Barry

[*] w/apologies to Jack Webb/Joe Friday, who never actually uttered that paraphrase exactly. http://www.snopes.com/radiotv/tv/dragnet.asp

Richard Wackerbarth

11 Jul 11 Jul

10:39 a.m.

Stephen,

I'm not advocating that Postorious be THE keeper of the user information. But it would be a good candidate for the job.

What I am advocating is that the "core" message handler NOT be the keeper of ONLY PART of it.

I am perfectly happy to have the user info handled by an independent stand-alone module which is willing to take responsibility for ALL of the user profile info. It should provide a REST interface that allows other modules the ability perform authentications, manipulate the profile, etc.

Another acceptable methodology would be to store all of the data in a real relational database and allow each of the modules direct access to the database engine.

Richard

On Jul 11, 2012, at 12:12 AM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
Since by far the most complicated, and most used, logins will be made from the web interface, it is much more logical to allow Postorius to be responsible for all of the user profile information and have the core, on those rare occasions where it requires it, obtain password confirmation from that source.

From the point of view of Postorius implementers, no question about it.<wink/>

But isn't that going to take us a long way down the road where we anoint Postorius the one-and-only admin interface? If that really needs to be, OK, but I don't much like it. Among other things, it will make the design and detailed UI of Postorius a focus of discussion for everybody concerned with Mailman 3. And it makes the option to "build one to throw away" much more difficult -- the design decisions already made, and will be made in the near future, will probably live as long as Pipermail has (and Pipermail will continue for several more years, at least!)

If that doesn't scare you....<wink/>

Stephen J. Turnbull

1:14 p.m.

Richard Wackerbarth writes:

...

What I am advocating is that the "core" message handler NOT be the keeper of ONLY PART of it.

What I'm advocating (mildly, because somebody else is going to have to do the work) is that the core be the keeper of ALL of it. The core is not just a "message handler". It is also a database, containing both list information and subscriber information. Since a minimum of subscriber information is absolutely essential to the core job, all of it may as well be in there. In some configurations we will want the subscribers to be authenticated, so we may as well keep all such information in the core's database.

Steve

Richard Wackerbarth

1:34 p.m.

On Jul 11, 2012, at 8:14 AM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
What I am advocating is that the "core" message handler NOT be the keeper of ONLY PART of it.

What I'm advocating (mildly, because somebody else is going to have to do the work) is that the core be the keeper of ALL of it. The core is not just a "message handler". It is also a database, containing both list information and subscriber information.

OK, so we agree that ALL of the information SHOULD be stored in one place. That means that this database will need a lot more information, such as access control specifications, etc. Further, it needs to be extensible so that various users can add whatever customizations and extensions they need.

And each of those functions will need supporting views, etc.

Pretty soon, you will find that what you need approaches something that already exists -- a relational database. Rather than "reinventing the wheel", we should just use an already existing database system and make all of the data directly accessible.

...

Since a minimum of subscriber information is absolutely essential to the core job, all of it may as well be in there.

This does not follow logically. Since only a minimum of information is essential to the core job, it may well be more appropriate for it to get that information from another source as needed.

...

In some configurations we will want the subscribers to be authenticated, so we may as well keep all such information in the core's database.

Steve

Applying your previous argument, I could equally say "since the web user needs to be authenticated, we may as well keep all such information in the webUI's database"

Richard

Florian Fuchs

3:07 p.m.

Hi there,

Am 11.07.12 15:34, schrieb Richard Wackerbarth:

...

On Jul 11, 2012, at 8:14 AM, Stephen J. Turnbull wrote:

...
Richard Wackerbarth writes:

...
What I am advocating is that the "core" message handler NOT be the keeper of ONLY PART of it.

What I'm advocating (mildly, because somebody else is going to have to do the work) is that the core be the keeper of ALL of it. The core is not just a "message handler". It is also a database, containing both list information and subscriber information.

If we're only talking authentication data, I agree.

But I also agree with Terri that there might be a good amount of user data used by Postorius, Hyperkitty or any other web ui/client that just doesn't have anything to do with mailman's core tasks. And I don't see why something like "preferred ui theme" or profile-related stuff like "irc nick" should be stored in the core db.

Isn't it very common that applications combine information from different sources (databases, webservices,...) in one place (with or without caching them locally)? I don't see anything unusual in the concept of having some mailman-related user data managed by the mailman core and other kinds of data handled by the database/file-structure/key-value-store/web-service(s) that a web application is using.

If Postorius and HyperKitty decide to share some information in one place, because the projects are so closely related, that's of course a fine idea. But I wouldn't try to cram everything into the core db just for the sake of having it all in one place.

Florian

...

OK, so we agree that ALL of the information SHOULD be stored in one place. That means that this database will need a lot more information, such as access control specifications, etc. Further, it needs to be extensible so that various users can add whatever customizations and extensions they need.

And each of those functions will need supporting views, etc.

Pretty soon, you will find that what you need approaches something that already exists -- a relational database. Rather than "reinventing the wheel", we should just use an already existing database system and make all of the data directly accessible.

...
Since a minimum of subscriber information is absolutely essential to the core job, all of it may as well be in there.

This does not follow logically. Since only a minimum of information is essential to the core job, it may well be more appropriate for it to get that information from another source as needed.

...
In some configurations we will want the subscribers to be authenticated, so we may as well keep all such information in the core's database.

Steve

Applying your previous argument, I could equally say "since the web user needs to be authenticated, we may as well keep all such information in the webUI's database"

Richard

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/f%40state-of-mind....

Security Policy: http://wiki.list.org/x/QIA9

Richard Wackerbarth

4:46 p.m.

The problem in splitting data is that links between the various related entries need to be maintained. This means having the ability to go from one entry to the related ones, and back again. When the data is stored in multiple tables, foreign keys provide the link. When both tables reside in the same database, joins can be performed, etc. However, when the tables are in separate databases, at a minimum, each database needs to "know about" (store the inverse key) the other entries. They also need to be able to assure that the corresponding entries in the foreign tables get created, and likely require that they be unique. Because of thread locking considerations, this may not be so easy to accomplish.

On Jul 11, 2012, at 10:07 AM, Florian Fuchs wrote:

...

Hi there,

Am 11.07.12 15:34, schrieb Richard Wackerbarth:

...
On Jul 11, 2012, at 8:14 AM, Stephen J. Turnbull wrote:

...
Richard Wackerbarth writes:

...
What I am advocating is that the "core" message handler NOT be the keeper of ONLY PART of it.

What I'm advocating (mildly, because somebody else is going to have to do the work) is that the core be the keeper of ALL of it. The core is not just a "message handler". It is also a database, containing both list information and subscriber information.

If we're only talking authentication data, I agree.

But I also agree with Terri that there might be a good amount of user data used by Postorius, Hyperkitty or any other web ui/client that just doesn't have anything to do with mailman's core tasks. And I don't see why something like "preferred ui theme" or profile-related stuff like "irc nick" should be stored in the core db.

Isn't it very common that applications combine information from different sources (databases, webservices,...) in one place (with or without caching them locally)? I don't see anything unusual in the concept of having some mailman-related user data managed by the mailman core and other kinds of data handled by the database/file-structure/key-value-store/web-service(s) that a web application is using.

If Postorius and HyperKitty decide to share some information in one place, because the projects are so closely related, that's of course a fine idea. But I wouldn't try to cram everything into the core db just for the sake of having it all in one place.

Florian

...
OK, so we agree that ALL of the information SHOULD be stored in one place. That means that this database will need a lot more information, such as access control specifications, etc. Further, it needs to be extensible so that various users can add whatever customizations and extensions they need.

And each of those functions will need supporting views, etc.

Pretty soon, you will find that what you need approaches something that already exists -- a relational database. Rather than "reinventing the wheel", we should just use an already existing database system and make all of the data directly accessible.

...
Since a minimum of subscriber information is absolutely essential to the core job, all of it may as well be in there.

This does not follow logically. Since only a minimum of information is essential to the core job, it may well be more appropriate for it to get that information from another source as needed.

...
In some configurations we will want the subscribers to be authenticated, so we may as well keep all such information in the core's database.

Steve

Applying your previous argument, I could equally say "since the web user needs to be authenticated, we may as well keep all such information in the webUI's database"

Richard

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/f%40state-of-mind....

Security Policy: http://wiki.list.org/x/QIA9

Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/richard%40nfsnet.o...

Security Policy: http://wiki.list.org/x/QIA9

Stephen J. Turnbull

5:50 p.m.

Florian Fuchs writes:

...

But I also agree with Terri that there might be a good amount of user data used by Postorius, Hyperkitty or any other web ui/client that just doesn't have anything to do with mailman's core tasks. And I don't see why something like "preferred ui theme" or profile-related stuff like "irc nick" should be stored in the core db.

Because there may be multiple clients wanting to access that data.

...

Isn't it very common that applications combine information from different sources (databases, webservices,...) in one place (with or without caching them locally)?

Yes, in fact it is a common RFE ("I'd like to get general information about our organization members from our organization's membership database and combine it with Mailman-specific information.")

Nevertheless, it would be preferable if that is wrapped in an API that makes it look like it's all coming form one place, and which manages the linkages so that data is not store redundantly in different places, or inaccessible to certain clients. Even UI theme, which you might think would be very specific to each app, is likely to be unified site-wide. At least, site designers will probably want that to be possible.

I just don't think there's all that much data which nobody will ever find a use for sharing it cross-app. So we may as well provide APIs for that sharing from the start.

...

I don't see anything unusual in the concept of having some mailman-related user data managed by the mailman core and other kinds of data handled by the database/file-structure/key-value-store/web-service(s) that a web application is using.

Common, yes, but the cross-DB linkage problems that arise in that architecture are both predictable and already being observed.

...

If Postorius and HyperKitty decide to share some information in one place, because the projects are so closely related, that's of course a fine idea. But I wouldn't try to cram everything into the core db just for the sake of having it all in one place.

I don't see any "cramming" here. The point of using an RDBMS (or a derivative such as an ORM or Django itself) is to give the the database a natural structure in multiple tables or objects, while accessing it through a common API.

If the core wants to delegate some DB maintenance to another module or an external app such as an organizational member DB, that's OK, but I think that there should be a central point of contact for all of the data for a given object such as a user.

Steve

Barry Warsaw

12 Jul 12 Jul

2:48 a.m.

On Jul 12, 2012, at 02:50 AM, Stephen J. Turnbull wrote:

...

Nevertheless, it would be preferable if that is wrapped in an API that makes it look like it's all coming form one place, and which manages the linkages so that data is not store redundantly in different places, or inaccessible to certain clients.

Then it has to be outside the core, and the core must be extended to access that external information, using one of the many mechanisms I've outlined.

-Barry

Stephen J. Turnbull

11 Jul 11 Jul

5:50 p.m.

Richard Wackerbarth writes:

...

Pretty soon, you will find that what you need approaches something that already exists -- a relational database. Rather than "reinventing the wheel", we should just use an already existing database system and make all of the data directly accessible.

We're already doing that, with ORMs overlying the RDBMS.

...

Since only a minimum of information is essential to the core job, it may well be more appropriate for it to get that information from another source as needed.

True, but we've already agreed that the user information should be kept in one place, based on your experience.

...

Applying your previous argument, I could equally say "since the web user needs to be authenticated, we may as well keep all such information in the webUI's database"

It's not the same argument. A mailing list needs a message distribution agent; it doesn't *need* a webUI.

There may be implementation reasons why it's better to handle database requirements in the webUI or a new daemon, but nobody's given any yet.

Richard Wackerbarth

7:22 p.m.

On Jul 11, 2012, at 12:50 PM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
Pretty soon, you will find that what you need approaches something that already exists -- a relational database. Rather than "reinventing the wheel", we should just use an already existing database system and make all of the data directly accessible.

We're already doing that, with ORMs overlying the RDBMS.

No! Although you are making available (some/most/all) of the data values, you are not making available the ability to make arbitrary SQL-type queries to view the data.

...

...
Since only a minimum of information is essential to the core job, it may well be more appropriate for it to get that information from another source as needed.

True, but we've already agreed that the user information should be kept in one place, based on your experience.

...
Applying your previous argument, I could equally say "since the web user needs to be authenticated, we may as well keep all such information in the webUI's database"

It's not the same argument. A mailing list needs a message distribution agent; it doesn't *need* a webUI.

In this day and age, try selling that one! Only those of us, and that especially includes me, who were around "way back when", before http even existed, know any other way. :)

Additionally, look at what other MM developers are doing. For example, Hyperkitty is not oriented to retrieving archived messages by email. I think we MUST assume that there will be a web interface. If someone wants to have a mailing list without one, it should be easy enough to "stub off" the limited amount of user information that is needed. But I think that this use case will be the exception rather than the rule.

...

There may be implementation reasons why it's better to handle database requirements in the webUI or a new daemon, but nobody's given any yet.

As far as the complexity of access is concerned, the mail handler probably has the lowest requirements. The present architecture would have to be extended significantly to provide for appropriate handling of ALL of the data. That includes a much richer query capability.

Presently, the message handling is integrally tied to the database implementation. Customization extensions will intrude into parts of the system which they should not affect.

As far as I am concerned, those are more than adequate reasons.

I view your argument as the message handler claiming "I'm special! Everyone else has do do things my way. I get special privileges." -- IMHO, the tail is wagging the dog.

Let's split shared data storage from the message handler, and from any "admin-by-mail" component, treating each of them as separate logical components. Provide each of them EXACTLY the same accesses as those provided to the WebUI, Message Archiver, etc.

Stephen J. Turnbull

8:29 p.m.

Richard Wackerbarth writes:

...

No! Although you are making available (some/most/all) of the data values, you are not making available the ability to make arbitrary SQL-type queries to view the data.

AFAIK the plan is to do that via the REST interface. I don't see why they need to be "arbitrary SQL-type" queries; you need to be a bit more explicit about that.

...

...
It's not the same argument. A mailing list needs a message distribution agent; it doesn't *need* a webUI.

In this day and age, try selling that one! Only those of us, and that especially includes me, who were around "way back when", before http even existed, know any other way. :)

So what? The fact that a webUI is very convenient is not the point. The point is that the message distribution agent is mission-critical; if it goes down you are well-and-truly screwed. If the web UI goes down, it might not even be noticed for weeks.

...

I think we MUST assume that there will be a web interface.

Of course there will be a web interface. As you point out, we couldn't even give away the product without it. But there might be half a dozen different ones, too.

...

As far as the complexity of access is concerned, the mail handler probably has the lowest requirements. The present architecture would have to be extended significantly to provide for appropriate handling of ALL of the data. That includes a much richer query capability.

So what? This extension needs to be done *somewhere*; you aren't going to be able to avoid it by throwing it out of the core. What you are going to be able to do by throwing it out of the core is ensure that each non-core module will do it differently and incompatibly. In fact, as you point out, they already are different and incompatible. I don't see any reason for that to change unless the DB API is centralized somewhere.

I see no reason for that somewhere to be anywhere but the core. The core is the only component that we *know* *has* to be there, and there will be only one implementation of it. The various UIs have substitutes, and one reason for splitting them out was to make it possible to have multiple implementations (and you've been at pains to point that out).

...

Presently, the message handling is integrally tied to the database implementation. Customization extensions will intrude into parts of the system which they should not affect.

Why? Parts of the system that don't need them just ignore them. Where's the problem?

...

I view your argument as the message handler claiming "I'm special!

It is. First, it is mission-critical; nothing else is. Second, it does need to ensure good performance, which I doubt is true of other components. Whether that justifies bypassing the REST API or whatever, I don't know.

...

Everyone else has do do things my way. I get special privileges."

Well, you've misunderstood me, then. My intention is that the message handler use the same APIs available to everybody else, except that other applications might need to use the REST interface where the core might have more direct access.

...

-- IMHO, the tail is wagging the dog.

Call it "claim" if you like, but the message handler is the dog.

...

Let's split shared data storage from the message handler,

I don't have a problem with that as a matter of design, but for distribution it will be bundled with the message handler. That's not necessarily true of other components.

Steve

Richard Wackerbarth

10:04 p.m.

On Jul 11, 2012, at 3:29 PM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
No! Although you are making available (some/most/all) of the data values, you are not making available the ability to make arbitrary SQL-type queries to view the data.

AFAIK the plan is to do that via the REST interface. I don't see why they need to be "arbitrary SQL-type" queries; you need to be a bit more explicit about that.

As an example, suppose that I want to have an "intelligent" ToDo indicator. As a minimum, I need to be able to get from the data store a list of MLs that have pending requests AND for which I am authorized to do that work. Typically, this would be some kind of join. Or the "social media" guys might want to add some "respected author" value and incorporate that into the way that messages get displayed. That data doesn't even exist in the "stock" distribution. I don't pretend to know just what our users will want to add. But they should be allowed to write an SQL-type description of their needs and they shouldn't "muck" with the inner workings of the message handling schema to do so.

...

The point is that the message distribution agent is mission-critical; if it goes down you are well-and-truly screwed. If the web UI goes down, it might not even be noticed for weeks.

I don't buy that. If you advertise a subscribe URL, or any other function, that is just as much a "mission critical" component as any other.

I don't see user passwords providing much direct use in the mail distribution system. They might be critical to a list that requires moderation. But, even there, I suspect that the moderators are likely to utilize the web interface.

...

...
I think we MUST assume that there will be a web interface.

Of course there will be a web interface. As you point out, we couldn't even give away the product without it. But there might be half a dozen different ones, too.

...
As far as the complexity of access is concerned, the mail handler probably has the lowest requirements. The present architecture would have to be extended significantly to provide for appropriate handling of ALL of the data. That includes a much richer query capability.

So what? This extension needs to be done *somewhere*; you aren't going to be able to avoid it by throwing it out of the core.

No, but I will "compartmentalize" it.

...

What you are going to be able to do by throwing it out of the core is ensure that each non-core module will do it differently and incompatibly.

No, I am suggesting that either you implement the functionality by specifying that some particular structure be set in a standard database (and provide a reference implementation of doing so) or that you specify REST interfaces that implement the appropriate functions and REQUIRE that all components manipulate that data ONLY through those interfaces.

The REST interface is not a single entity, but a collection of components that inter-operate. Each of them is "mission critical".

Further, "each non-core module will do it differently and incompatibly" is a red herring. There MUST be a SPECIFICATION of the interface and EVERY implementation MUST meet those REQUIREMENTS. What ever else it does will not affect any other part of the system.

...

In fact, as you point out, they already are different and incompatible.

That is because you have not followed the principles and allow "someone else" to provide that service.

...

I don't see any reason for that to change unless the DB API is centralized somewhere.

I see no reason for that somewhere to be anywhere but the core. The core is the only component that we *know* *has* to be there, and there will be only one implementation of it. The various UIs have substitutes, and one reason for splitting them out was to make it possible to have multiple implementations (and you've been at pains to point that out).

...

...
Presently, the message handling is integrally tied to the database implementation. Customization extensions will intrude into parts of the system which they should not affect.

Why? Parts of the system that don't need them just ignore them. Where's the problem?

Spaghetti. In any system of complexity, there will be parts that could care less about the details of most other parts. design modularity practices separate those parts and provide limited access between them.

...

...
I view your argument as the message handler claiming "I'm special!

It is. First, it is mission-critical; nothing else is.

And the underlying RDBMS, the MTA, etc. are not?

...

Second, it does need to ensure good performance, which I doubt is true of other components. Whether that justifies bypassing the REST API or whatever, I don't know.

The question is whether particular data is local or shared. If I, as one of the data consumers, don't need access to some of the data, there is no reason to attempt to expose it via REST. Therefore, even if "maintain ..... (certain data)" is a part of the specification, how it is maintained is an internal detail.

...

...
Everyone else has do do things my way. I get special privileges."

Well, you've misunderstood me, then. My intention is that the message handler use the same APIs available to everybody else, except that other applications might need to use the REST interface where the core might have more direct access.

This is my objection. IF some particular data is exposed, then it should be maintained by one handler, without back doors. If that handler is local, then the interface need not serialize the data and transmit it, but the access should be isomorphic to doing so.

...

...
-- IMHO, the tail is wagging the dog.

Call it "claim" if you like, but the message handler is the dog.

With respect to messages, yes. But not with respect to credentials.

Credentials should be kept in a separate box. And that box should be kept where ever it best fits in the overall data flow. From a design perspective, it should be easy to place it anywhere the installer desires.

...

...
Let's split shared data storage from the message handler,

I don't have a problem with that as a matter of design, but for distribution it will be bundled with the message handler. That's not necessarily true of other components.

For distribution, a reference implementation of EVERY interface should be included.

And substituting a different implementation should be a simple as excluding the distribution version and dropping in the alternate.

...

Steve

Barry Warsaw

12 Jul 12 Jul

2:45 a.m.

On Jul 11, 2012, at 05:04 PM, Richard Wackerbarth wrote:

...

As an example, suppose that I want to have an "intelligent" ToDo indicator. As a minimum, I need to be able to get from the data store a list of MLs that have pending requests AND for which I am authorized to do that work. Typically, this would be some kind of join.

This should be hidden behind a REST API.

...

Or the "social media" guys might want to add some "respected author" value and incorporate that into the way that messages get displayed. That data doesn't even exist in the "stock" distribution.

Nor probably, should it.

...

I don't pretend to know just what our users will want to add. But they should be allowed to write an SQL-type description of their needs and they shouldn't "muck" with the inner workings of the message handling schema to do so.

Then, for sites that want that feature, the data should live outside the core. As I hope I've explained, that could mean a separate database component with an alternative IUser (et al) implementation to make the external queries. Or it could mean a REST API to push relevant changes in the separate database component back into the core.

...

...
The point is that the message distribution agent is mission-critical; if it goes down you are well-and-truly screwed. If the web UI goes down, it might not even be noticed for weeks.

I don't buy that. If you advertise a subscribe URL, or any other function, that is just as much a "mission critical" component as any other.

But maybe you don't have such a subscribe URL. I've described several use cases where such a thing isn't needed. It must be possible to support both situations.

...

I don't see user passwords providing much direct use in the mail distribution system. They might be critical to a list that requires moderation. But, even there, I suspect that the moderators are likely to utilize the web interface.

Users do make changes to their subscriptions, or moderate their mailing lists through email commands today. Of course, even that is optional.

...

No, I am suggesting that either you implement the functionality by specifying that some particular structure be set in a standard database (and provide a reference implementation of doing so) or that you specify REST interfaces that implement the appropriate functions and REQUIRE that all components manipulate that data ONLY through those interfaces.

The REST interface is not a single entity, but a collection of components that inter-operate. Each of them is "mission critical".

Agreed. But the core's REST API should *only* reflect the data model that is critical to its operations.

In fact, the core already calls out to a REST API to retrieve some information. Did you know that the header and footer added to messages is a resource described by a URL? Mailman does a GET on that URL with a defined format to retrieve the appropriate decoration, which it then caches for a little while. That's the core making a REST call for data living in an external system[*].

...

Further, "each non-core module will do it differently and incompatibly" is a red herring. There MUST be a SPECIFICATION of the interface and EVERY implementation MUST meet those REQUIREMENTS. What ever else it does will not affect any other part of the system.

Right. But I would strongly object to that specification in any way being tied to SQL or a specific database implementation. HTTP is the language of the web and REST is a not-perfect but good enough convention built on HTTP.

Cheers, -Barry

[*] Possibly. The data may live on an internal-to-the-core resource, which are defined as mailman:// schemes. The core doesn't care where the resource lives, it just does a GET which by definition is REST.

Stephen J. Turnbull

7:05 a.m.

Richard Wackerbarth writes:

...

As an example, suppose that I want to have an "intelligent" ToDo indicator. As a minimum, I need to be able to get from the data store a list of MLs that have pending requests AND for which I am authorized to do that work. Typically, this would be some kind of join.

OK. But in my head, Python is a dynamic language, and we should be able to use the ORM to dynamically revise the DB schema, and access such complexly specified data.

...

I don't pretend to know just what our users will want to add. But they should be allowed to write an SQL-type description of their needs and they shouldn't "muck" with the inner workings of the message handling schema to do so.

So by "SQL-type" do you mean they *must* have access to the RDBMS so they can actually write SQL, or just that the provided interface needs to allow queries with logical operators? The "-type" suggests you mean the latter.

But you've also suggested the former. I don't like the idea of direct access to the underlying database, because there isn't necessarily going to be just one, and it may be that Mailman needs certain kinds of access to component DBs (eg, updating email addresses) but the organization would like to have access controls on them based on another component database (authorized admins, say). Also, we're not in a position to require that all databases be kept in, say, Postgresql. They may not even be RDBMSes (LDAP member databases, sendmail alias files). So we need a layer of abstraction.

...

...
The point is that the message distribution agent is mission-critical; if it goes down you are well-and-truly screwed. If the web UI goes down, it might not even be noticed for weeks.

I don't buy that. If you advertise a subscribe URL, or any other function, that is just as much a "mission critical" component as any other.

We'll have to agree to disagree.

...

I don't see user passwords providing much direct use in the mail distribution system.

I don't understand what you're thinking. You started this thread with the observation that various components are keeping data in different places, and that this data is often redundant but not synced or inaccessible. To me this suggests a design principle: a single conceptual database managed by a core component (i.e., one that is present in every Mailman 3 system).

The implementation of that database may very well include multiple database systems (eg, the organization's LDAP directory, a Postgresql database for the tables related to list configurations, and an MTA alias file for the list addresses). However, these need to be managed via a single common API, and the data must not be private to any non-core component.

The fact that some data are not useful to all components seems to me to be a red herring. The point of a DBMS in general is that you can flexibly access only the data you need for the job at hand, in a form optimized for the job at hand.

...

...
So what? This extension needs to be done *somewhere*; you aren't going to be able to avoid it by throwing it out of the core.

No, but I will "compartmentalize" it.

You mean "as a single entity in the distribution of core components", or "as per-component entities containing what each component needs"?

...

No, I am suggesting that either you implement the functionality by specifying that some particular structure be set in a standard database (and provide a reference implementation of doing so) or

I think that's a non-starter. We are not in a position to specify that there even *be* a standard database backing our API, unless we're willing to push the redundancy/inaccessibility problems to the next higher level by copying databases from organizational sources *outside* of Mailman *into* Mailman-only databases. I consider that unacceptable; use of external databases for subscriber lists is a high-frequency RFE, and it would be *way* higher if it weren't for the extremely high quality of MM-U participants, most of whom check the FAQ/tracker and notice that there already is an RFE on file. AIUI, Barry does too.

...

that you specify REST interfaces that implement the appropriate functions and REQUIRE that all components manipulate that data ONLY through those interfaces.

The REST interface is not a single entity, but a collection of components that inter-operate.

This makes no sense to me. I see the architecture as

      +--------------+             +-------+
      |   Message    |             |       |
      | Distribution |  . . . . .  | WebUI |
      +--------------+             +-------+
             \            |            /
              \           |           /
               \          |          /
              +-----------------------+
              |       REST API        |
              +-----------------------+
                /　　　　 |         \
               /          |          \
              /           |           \
      +------------+             +------------+
      | Subscriber |  . . . . .  |   Social   |
      |    List    |             | Networking |
      |            |             |    Data    |
      +------------+             +------------+

where the "MD" component may perceive a member in terms of only subscriber data (i.e., something on the order of (FullName, Email, BounceCount)), while the "WUI" component might be interested in something like (Avatar, FullName, Email, IsATroll). (Of course the lower ellipsis also include a site config DB and a list config DB.)

To my mind a Pythonic base REST API would return MemberObjects with appropriate properties, and the properties would be turned into DB queries on access.

For performance-critical cases there would be a separate .query() method on MemberObjects that would look up a vector of attributes in one DB query. Also a .select() method on the MailmanDB object which would return a list of MemberObjects with specified properties, optionally as a (MemberObject, *values_of_requested_properties) tuple or dict.

...

Further, "each non-core module will do it differently and incompatibly" is a red herring. There MUST be a SPECIFICATION of the interface and EVERY implementation MUST meet those REQUIREMENTS. What ever else it does will not affect any other part of the system.

Have you ever told a baby to stop sucking their thumb, and use the pacifier? You have to pull the thumb out to get the point across. In the same way, there's going to have to be one implementation, and that implementation will be distributed with the core. Otherwise there WILL be a SPECIFICATION of the interface and EVERY implementation WILL meet those REQUIREMENTS (except where the implementer finds it inconvenient), and we're back where we started.

...

That is because you have not followed the principles and allow "someone else" to provide that service.

True. (I wish you'd stop using "you" in this kind of statement; it isn't true, I didn't code any of this. And it doesn't matter who did.) Announcing principles isn't going to help enough, though. Python operates on the basis of "consenting adults" and can't force anybody write a program in a particular way. Unless the API is actually provided in *every* Mailman 3 distribution, and is well- enough designed to be TOOWTDI, implementers will work around it.

...

...
...
I view your argument as the message handler claiming "I'm special!

It is. First, it is mission-critical; nothing else is.

And the underlying RDBMS, the MTA, etc. are not?

This confounds levels of architecture.

...

This is my objection. IF some particular data is exposed, then it should be maintained by one handler, without back doors. If that handler is local, then the interface need not serialize the data and transmit it, but the access should be isomorphic to doing so.

That's not an objection, that's a somewhat more precise restatement of what I wrote.

...

Credentials should be kept in a separate box. And that box should be kept where ever it best fits in the overall data flow.

Precisely. Since databases will be needed by all components when present, they should be kept in or with a component that will always be present. That's what "core" means.

...

From a design perspective, it should be easy to place it anywhere the installer desires.

No. That exposes an implementation detail. As far as installers are concerned, the database *is* the API. Where it is located is none of their business.

There will need to be a little leakage here, because admins will want to link the Mailman DB to existing organizational DBs. So the possibility of specifying an existing external database needs to be considered. But this is only slightly more than the amount of information required to configure Mailman's own PostgreSQL or MySQL database, and these are not going to be "placed" by a Mailman admin, but rather configured and accessed from a provided installation (whether by the user organization, or by an OS distro). So I don't see a need to make a big distinction here, except that the "own" database will have a schema designed for Mailman, but an external database will need some kind of "adapter" to match schema.

...

For distribution, a reference implementation of EVERY interface should be included.

I don't see how that's possible in your design, since you propose to allow components to implement their own databases.

...

And substituting a different implementation should be a simple as excluding the distribution version and dropping in the alternate.

Sure, but is there a reason why this might be difficult? ISTM that Python's orientation to duck-typing will make this happen naturally. (I don't mean to ignore the possibility of problems, but if you have something specific in mind we can be careful to avoid that in the design process.)

Richard Wackerbarth

4:51 p.m.

... taken out of sequence ...

...

...
That is because you have not followed the principles and allow "someone else" to provide that service.

True. (I wish you'd stop using "you" in this kind of statement; it isn't true, I didn't code any of this. And it doesn't matter who

I apologize for my terminology. Rather than using the second person, I should use the third. However, no where do I mean "you" to reflect on you personally. I tend to use personal pronouns as a short form representing "the argument/proposal/design that, in the context of the message, is being described by a particular person", as distinguished from one which another person has described/advocated. Similarly, if I were to use "Barry" in that context, I would be referring to an idea that he has described. I do not know, necessarily, if any methodology is the preferred approach of the individual describing it.

There seems to be two fundamental design strategies being discussed. One of them has a monolithic data store and the other has a distributed store. Barry has expressed some reservations about overloading a monolithic data store with data extraneous to the fundamental mission of message handling.

I have expressed concern in requiring any implementation to maintain related data in a split format. I recognize that there will be cases where this is necessary (for example the Launchpad case as described by Barry in another message). But, as he notes, such implementations tend to be "brittle". Especially where there are multiple components which can alter the data. But, unless it is a constraint external to MM, I do not believe that such a restriction should be introduced.

There is also an issue of what the term "core" means. Perhaps you have been referring to a distribution package. I have been referring to one component of such a package, in particular that component which interacts with the MTAs and redistributes messages. I consider the processing of administrative messages to be a separate component. And I consider the storage of configuration information to be yet another component. In my view, each component extends only as far as its parts interact with the same private data representation.

On Jul 12, 2012, at 2:05 AM, Stephen J. Turnbull wrote:

...

Richard Wackerbarth writes:

...
I don't pretend to know just what our users will want to add. But they should be allowed to write an SQL-type description of their needs and they shouldn't "muck" with the inner workings of the message handling schema to do so.

So by "SQL-type" do you mean they *must* have access to the RDBMS so they can actually write SQL, or just that the provided interface needs to allow queries with logical operators? The "-type" suggests you mean the latter.

I do mean the latter. But, if the real underlying database is a RDBMS, then, within the "black box", these queries probably should be implemented by translating them to real SQL queries and passing those to the RDBMS. There has been a lot of work, by many far more qualified that we, to handle such details within the RDBMS. We should not attempt to reinvent their wheel.

...

But you've also suggested the former.

I have suggested it as an alternative implementation. I do so only because that strategy exposes a powerful resource and avoids the burden of adding a new mechanism in order to meet the requirement for customized extensions and access that need to interact efficiently with the data that MM needs to have maintained.

...

I don't like the idea of direct access to the underlying database, because there isn't necessarily going to be just one, and it may be that Mailman needs certain kinds of access to component DBs (eg, updating email addresses) but the organization would like to have access controls on them based on another component database (authorized admins, say). Also, we're not in a position to require that all databases be kept in, say, Postgresql. They may not even be RDBMSes (LDAP member databases, sendmail alias files). So we need a layer of abstraction.

I agree that we need the abstraction.

...

...
I don't see user passwords providing much direct use in the mail distribution system.

I don't understand what you're thinking.

First, we seem to have a different conceptual model of MM. I view that which is being called "core", not as a single entity, but a collection of components, most of which are critical to the operation of the system. Among those components, I distinguish message routing and distribution, configuration storage, and processing of administrative messages.

The first is what I have been calling the message handling. It interacts with the MTAs, maintains queues of partially completed work, etc.

The second is critical in that it provides the customization information which causes each mailing list to be distinctive. It can be further subdivided into structural configuration (the location of various interfaces, the parameters defining the lists, etc.), rosters of subscribers, and subscription preferences.

The third component implements the processing of messages which are designed to alter the state of the configuration storage and/or the state of messages queued in the message handler. I do not consider this element "critical" in that the messages which it will process can be queued and handled later, or the component can be omitted entirely in a system that utilizes a webUI as an alternative access to handle those administrative functions.

...

You started this thread with the observation that various components are keeping data in different places, and that this data is often redundant but not synced or inaccessible. To me this suggests a design principle: a single conceptual database managed by a core component (i.e., one that is present in every Mailman 3 system).

Yes, that is how I started the thread. However, you misinterpret the requirement for a monolithic database. Certainly a monolithic database would be one way to accomplish DRY storage of the data, but it can also be accomplished in a distributed manner. What I am suggesting is that in a distributed system, no component of the system has the right to demand that it have the exclusive right to be the keeper of certain shared data. But, further, that any component taking on that responsibility should also be responsible for the storage of any related items.

...

The implementation of that database may very well include multiple database systems (eg, the organization's LDAP directory, a Postgresql database for the tables related to list configurations, and an MTA alias file for the list addresses). Agreed.

...

However, these need to be managed via a single common API, and the data must not be private to any non-core component. I would agree only if you drop the "non-core". Each component may have "private" data. But that data cannot include any data that needs to be exposed by the API.

...

The fact that some data are not useful to all components seems to me to be a red herring. The point of a DBMS in general is that you can flexibly access only the data you need for the job at hand, in a form optimized for the job at hand. This is the reason that utilizing, and exposing, the database engine is an attractive way to implement the storage.

...

...
...
So what? This extension needs to be done *somewhere*; you aren't going to be able to avoid it by throwing it out of the core.

No, but I will "compartmentalize" it.

You mean "as a single entity in the distribution of core components", or "as per-component entities containing what each component needs"?

...
No, I am suggesting that either you implement the functionality by specifying that some particular structure be set in a standard database (and provide a reference implementation of doing so) or

I think that's a non-starter. We are not in a position to specify that there even *be* a standard database backing our API, unless we're willing to push the redundancy/inaccessibility problems to the next higher level by copying databases from organizational sources *outside* of Mailman *into* Mailman-only databases. I consider that unacceptable; use of external databases for subscriber lists is a high-frequency RFE, and it would be *way* higher if it weren't for the extremely high quality of MM-U participants, most of whom check the FAQ/tracker and notice that there already is an RFE on file. AIUI, Barry does too.

I agree. I consider the ability to store "rosters" and/or user information in databases which are not under the control of MM is something that I would make a design requirement. But, going along with that, use of such external storage also negates MM's responsibility to provide management for that data.

...

...
Further, "each non-core module will do it differently and incompatibly" is a red herring. There MUST be a SPECIFICATION of the interface and EVERY implementation MUST meet those REQUIREMENTS. What ever else it does will not affect any other part of the system.

Have you ever told a baby to stop sucking their thumb, and use the pacifier? You have to pull the thumb out to get the point across. In the same way, there's going to have to be one implementation, and that implementation will be distributed with the core. Otherwise there WILL be a SPECIFICATION of the interface and EVERY implementation WILL meet those REQUIREMENTS (except where the implementer finds it inconvenient), and we're back where we started.

There is going to have to be one REFERENCE implementation and that implementation will be sufficient to get a minimal system operational. But, because that implementation will not meet the operational needs of most users, there will be alternate implementations. You cannot stop that. You can only hope that those implementations will meet the specification.

Stephen J. Turnbull

6:31 p.m.

Richard Wackerbarth writes:

...

There seems to be two fundamental design strategies being discussed. One of them has a monolithic data store and the other has a distributed store. Barry has expressed some reservations about overloading a monolithic data store with data extraneous to the fundamental mission of message handling.

I have expressed concern in requiring any implementation to maintain related data in a split format. I recognize that there will be cases where this is necessary (for example the Launchpad case as described by Barry in another message). But, as he notes, such implementations tend to be "brittle". Especially where there are multiple components which can alter the data. But, unless it is a constraint external to MM, I do not believe that such a restriction should be introduced.

I don't think that anybody is considering requiring a monolithic store, in the sense of putting everything into a single backend DBMS, because we all agree that it should be possible to take member lists from an external database and augment them with Mailman-specific properties that we may not be permitted to store in the external database.

(Aside: I don't think we should assume that external databases are necessarily read-only. For example, I can imagine informal organizations that would allow Mailman to add new subscribers to the member directory, or sales organizations that would allow people to subscribe to product announcement lists and automatically add them the to CRM database.)

What I propose is a requirement is that any data added to any databases used by Mailman be accessible via standard Python introspection techniques. (In principle and by default, that is; Mailman already hides some data from some interfaces, such as the member list.) For example, if we use the "user as Python object" model, then the introspection method would simply be the 'dir' function. Other possibilities would be to have components register mutators and accessors for "their" data.

...

There is also an issue of what the term "core" means. Perhaps you have been referring to a distribution package. I have been referring to one component of such a package, in particular that component which interacts with the MTAs and redistributes messages.

I find that highly unintuitive. The core is the set of functions that are essential. The "distribution package" description is an heuristic. I.e., "these are the functions that would completely stop the show if you installed Mailman and discovered any one of them was not present."

...

I consider the processing of administrative messages to be a separate component. And I consider the storage of configuration information to be yet another component. In my view, each component extends only as far as its parts interact with the same private data representation.

I don't think that's a useful definition, to be honest. On the one hand, most functions have local variables, but surely that doesn't make them components by themselves. On the other, pretty much everything in Mailman interacts with mailing lists in one way or another, but surely none of us thinks of Mailman as a one-component application.

I think of "component" as a concept that belongs to the art of programming, and not having a technical definition. A component is any body of code and content that is a convenient unit of creation, maintenance, and administration. Of course issues of coherence and coupling will help determine what is "convenient", but I don't think they're sufficient in themselves.

...

I do mean the latter. But, if the real underlying database is a RDBMS, then, within the "black box", these queries probably should be implemented by translating them to real SQL queries and passing those to the RDBMS.

Sure. But this is more likely if we have a good ORM (which is a more Pythonic way of thinking about things) as an interface to the RDBMS, and all of that is wrapped in a convenient powerful API that allows the programmer to delegate data persistence to some component of Mailman.

...

First, we seem to have a different conceptual model of MM. I view that which is being called "core", not as a single entity, but a collection of components, most of which are critical to the operation of the system.

That's not what you said above; above you restrict it to the message routing and distribution component. I believe that is the definition you have been pretty consistently using throughout the thread. No?

Anyway, I find this one very close to my own thinking.

...

...
You started this thread with the observation that various components are keeping data in different places, and that this data is often redundant but not synced or inaccessible. To me this suggests a design principle: a single conceptual database managed by a core component (i.e., one that is present in every Mailman 3 system).

Yes, that is how I started the thread. However, you misinterpret the requirement for a monolithic database.

I think you're misinterpreting my words, actually, though I'm open to correction by a third party. By a "single conceptual database", I mean that there is a single API for accessing persistent Mailman data, and that you don't have to specify a connection to a database to access data. The implementation knows where all the data is stored, whether that happens to be a single humongous ZODB, or an heterogeous array of LDAP, SQL, and flatfile data stores.

...

Certainly a monolithic database would be one way to accomplish DRY storage of the data, but it can also be accomplished in a distributed manner. What I am suggesting is that in a distributed system, no component of the system has the right to demand that it have the exclusive right to be the keeper of certain shared data. But, further, that any component taking on that responsibility should also be responsible for the storage of any related items.

I question whether the pain of having an (explicitly) distributed system is worth the gain. As you've explained it here, I see it as setting us up for a situation where each component (including components that are substitutes performing the same conceptual function) will make their own decisions about what to store and where, and what is private and what is public, so that components will continually need to negotiate with each other over who have authority and responsibility for certain data.

...

From the responses of several people in this thread, I strongly suspect that most implementers will decide that most of the data they use is not interesting to other modules and make it private, rather than spend the effort needed to generalize. So I think the costs will be higher and the amount of shared data lower than for a system where one component is responsible for all connections to databases.

...

I would agree only if you drop the "non-core". Each component may have "private" data. But that data cannot include any data that needs to be exposed by the API.

And how do we know what "needs" to be exposed? We don't.

I'm sure we can make a killer MLM with a distributed database and each component storing private data. What I don't think we can do is make an MLM that's capable of killing web fora and Usenet, too that way. I think it's worth the extra effort to keep things general.

Barry Warsaw

2:27 a.m.

On Jul 11, 2012, at 02:22 PM, Richard Wackerbarth wrote:

...

No! Although you are making available (some/most/all) of the data values, you are not making available the ability to make arbitrary SQL-type queries to view the data.

Which frankly, I don't think we should do, in the core. The core is not a generalized database engine.

If this is absolutely required for the design of Postorius, then in my mind there has to be a separate database component, and an implementation layer on the bottom of the core which talks to that component.

...

In this day and age, try selling that one! Only those of us, and that especially includes me, who were around "way back when", before http even existed, know any other way. :)

Don't forget too that there are use cases where you interact with the system solely by email commands. Sure it might feel antiquated, but people who want to run their systems this way are passionate about it. This is something else that we must keep possible.

...

Presently, the message handling is integrally tied to the database implementation. Customization extensions will intrude into parts of the system which they should not affect.

Then *that's* a bug, but I think it's a bug of lack-of-implementation rather than design, as I've described in previous messages.

...

As far as I am concerned, those are more than adequate reasons.

I view your argument as the message handler claiming "I'm special! Everyone else has do do things my way. I get special privileges." -- IMHO, the tail is wagging the dog.

Well, I don't think that's the case, but of course message handling is kind of the whole point, isn't it? I mean, without that, what are we building? :)

-Barry

Barry Warsaw

2:21 a.m.

On Jul 12, 2012, at 02:50 AM, Stephen J. Turnbull wrote:

...

It's not the same argument. A mailing list needs a message distribution agent; it doesn't *need* a webUI.

Or I might rephrase that as: it doesn't need *a* webUI :).

Postorius rocks, and it is going to be a fantastic piece of the default Mailman story. It will in fact be the most visible piece, so it has to rock. Think of all the thousands of average-Joe mailing lists out there. They will be ecstatic to see the improvement over the current circa-1996 webUI. It's not even funny how much better Postorius will be.

We just have to keep the design open enough that the webUI can be changed or discarded and the mailing list system will continue to function.

-Barry

Barry Warsaw

2:16 a.m.

On Jul 11, 2012, at 05:39 AM, Richard Wackerbarth wrote:

...

I am perfectly happy to have the user info handled by an independent stand-alone module which is willing to take responsibility for ALL of the user profile info. It should provide a REST interface that allows other modules the ability perform authentications, manipulate the profile, etc.

Another acceptable methodology would be to store all of the data in a real relational database and allow each of the modules direct access to the database engine.

I think I've outlined how the core would be extended so that either of the above implementation approaches could work.

Just for the record, as long as it's done in a modular way (i.e. so that people who don't want Postorius can ignore it), I have no problem shipping well tested components that make integrating the core with Postorius easier. It just can't be the only way to do it.

Cheers, -Barry

Barry Warsaw

2:14 a.m.

On Jul 11, 2012, at 02:12 PM, Stephen J. Turnbull wrote:

...

But isn't that going to take us a long way down the road where we anoint Postorius the one-and-only admin interface? If that really needs to be, OK, but I don't much like it.

I don't either; it's unacceptable.

Many, maybe most, sites will be fine just using the default Postorius, but plenty of sites will not. They will want to make mailing list administration, or user subscriptions, fit in with their own site design. Think of the code-hosting example where you join teams and that automatically puts you on that team's mailing list. Maybe through the code-hosting's regular web-ui you can manage your subscriptions, but maybe they make all the other decisions for you, in order to keep the web-ui really really simple. They don't want Postorius, they just want the core. We must keep that as a possibility.

Cheers, -Barry

Barry Warsaw

2:09 a.m.

On Jul 10, 2012, at 10:51 PM, Richard Wackerbarth wrote:

...

Yes, it is a bug in Postorius. However, that does not negate the fact that the present design, by forcing a split database, makes it difficult to accomplish the desired behavior.

Why exactly is it difficult? I'm not trolling, I really want to know. Is it because there's too much to keep in sync? Or is it because the REST API calls to read or write the necessary information is missing? Or is it just a pain to chase down everywhere in Postorius that things can change and need to be pushed to Mailman?

The core has machinery to push or pull information, or for information to be pushed to or pulled from it. Let's use those and expand them where necessary. Or please help me understand why those can't possibly work, or at least why they are not working right now.

Since this thread started with logins, walk me through what Postorius has to do to log a user in when the password is in the core. Where are you stuck?

-Barry

Barry Warsaw

2:05 a.m.

Thanks for starting this discussion. Since the thread's already long, I'm just going to answer randomly with my own thoughts.

One thing I have a real problem with is defining the database query layer as the interface between components. To me that just unacceptably ties us to a specific database, and/or a specific protocol. For example, I do not want to *require* Postgres in order to run Mailman, or to integrate *a* web ui with the core. I just think that as convenient as that might seem today, it will lock us into a system design we're going to regret somewhere down the road.

So let's say for the moment that we agree that all the user data should live in one place. I don't have a problem with that conceptually, and I actually don't care whether that's part of the core or in a separate component. The other problem I have is extending the core's data model to include things it doesn't care about. When you realize all of that has to be documented and tested, that just seems like it's adding lots of extra baggage to the core.

For example, today you might want Twitter and Facebook ids in that database. Five years ago maybe you also wanted an AIM id in there. Do you today? Will you still want Google+ ids in there, or BrowserIDs, or OpenIDs five years from now? Yet, if it's part of the core's data model, we have to support it, test it, document it, go through deprecation cycles, etc. etc.

One of the important design decisions I made was using Zope interfaces to formally define the touch points between the different components of the system. This isn't just for the fun of it; instead, it gives us great implementation flexibility.

For example, if you need to know what email addresses a user has registered, you access that through the IUser interface. Rosters are another great example of where you access things through the IRoster interface and nothing else. Nothing except the implementation of that even cares that they are implemented as queries and don't exist as *real* objects in the system. They can return whatever they want, as long as they conform to the IUser or whatever interface.

This all might lead to inefficiencies, but I don't think that matters right now. It probably will some day, but let's worry about that if and when we need to. What we care about now is the *flexibility* and the *stability* of the system.

For the sake of argument, let's say that all the user information should be stored in Postorius. What kind of changes would be needed in the core to keep its view of the user world in sync with Postorius's view of the world. No matter how you slice it, you are going to have two separate processes that need to be kept in sync.

You actually could, as I think Richard advocates, just expose the SQL queries to both processes. You would in theory have to only re-implement a handful of interfaces to keep the rest of the system humming. IOW, when the IUserManager needs to look up a user by their email address, instead of running a query against the local SQLite database, you would run it against the Postorius database. But - and here's the key thing - you would *still* return some object that implements the IUser interface. If you do that, you've localized the changes you have to make to the core and everything else Just Works (again, in theory ;).

One of the things I've tried to do, with unknown success because nobody's tried it, is to keep in mind three broad slices of data: the user data, the list data, and the message data. So for example, an IMember associates an IAddress with an IMailingList. The standard implementation of that doesn't use a foreign key between the member table and the mailinglist table. Instead it uses the fqdn_listname, i.e. a string. What that *should* mean is that you could move the user data anywhere you want and not have to also move the list data and message store data.

There *should* be enough hooks in the system already for a system administrator to say "I want to use Postorius, so I must enable the Postorius IUserManager implementation". For global objects like this, we use the Zope Component Architecture (ZCA), so in a Postorius-owns-the-world scenario, what has to happen is that

usermanager = getUtility(IUserManager)

must return the PostoriusUserManager instance and not the SQLite based UserManager instance. Once you've done that, you have to change *nothing* else in the system because everything talks to that object through the interface, and as long as that keeps its contract, the rest of the system should, again Just Work.

I have no idea whether the above will be easy or not, since nobody's tried it. But the system design should allow you to do it this way, and I would be very open to the right hooks, fixes, and extensions to make this possible. I hope you can see how this approach lets someone run Mailman in many different configurations, from a core-only system, to Postorius, to a system where all the user database is in ZODB or already locked up in a proprietary closed database.

There is another approach of course, which may end up being simpler, if more brittle. You could just try to keep the two databases in sync. It doesn't matter too much which is the master, you just have to decide. This is essentially how Launchpad's integration with Mailman 2 works. Launchpad is the master database and whenever something in that database chances that could affect Mailman, that information is communicated to the Mailman server. The details are mostly unimportant, and yes, it does work. It's been brittle in the past, but now with enough logging, monitoring, and fail-safes it works great.

How would you keep these two in sync? First, if something changes in the core, the idea is that an event is triggered. Other components of the system watch for those events and react to the ones they care about. For example, let's say a user changes their password via email command. Once the core acts on that change, it will trigger a PasswordChangeEvent which has a reference to the user effecting that change. If Postorius was the master database for passwords, we'd have to add a little event subscriber which, when it got a PasswordChangeEvent, then talked to Postorius to make that change. Or maybe it updated the shared user data component, or made the appropriate SQL UPDATE call. The key thing again, is that the core just fires the PasswordChangeEvent, and other things react to it.

Alternatively, let's say a user changes their password through the web ui. I think this case is already covered, because the way to keep that in sync with the core is to make the appropriate REST call, probably PATCHing the user's password.

Very likely we don't have enough events defined to cover all the actions that the core must take (e.g. through email commands). But events are easy to add and again, I'm not opposed to adding any events which make the integration easier.

It's also likely that the REST API is incomplete for every bit of information Postorius wants to get into the core or out of the core. Again, it's easy to extend the REST API, so let's fill in what's missing.

I hope this lays out the basic design constraints that I want to follow. Maybe it sparks some thoughts about different possibilities.

Cheers, -Barry

Richard Wackerbarth

5:25 a.m.

On Jul 11, 2012, at 9:05 PM, Barry Warsaw wrote:

...

Thanks for starting this discussion. Since the thread's already long, I'm just going to answer randomly with my own thoughts.

And thanks for your reply. I just spotted something in it that makes me feel that rather than fundamental disagreement, we may simply be saying things differently. I'll dig through all of the rest of your remarks in detail at an appropriate interval.

...

One thing I have a real problem with is defining the database query layer as the interface between components.

Note that I expressed the preference that the interface be defined in terms of the REST access.

As I say in another post, it is not the exact expression of the interface that is important, but that all of the descriptions are isomorphic and that no component accesses shared data by "private" methods.

...

There *should* be enough hooks in the system already for a system

...

administrator to say "I want to use Postorius, so I must enable the Postorius IUserManager implementation".

And, by saying this, you imply that the storage of passwords is no longer the responsibility of "core".

You have delegated it to IUserManager. And the default Postorius compatible IUserManager could store those passwords where ever it wishes. In particular, it can store them in the Postorius User database, along with all of the other User profile information that the site wants to keep.

...

because everything talks to that object through the interface, and as long as that keeps its contract, the rest of the system should, again Just Work.

...

I have no idea whether the above will be easy or not, since nobody's tried it. But the system design should allow you to do it this way, and I would be very open to the right hooks, fixes, and extensions to make this possible. I hope you can see how this approach lets someone run Mailman in many different configurations, from a core-only system, to Postorius, to a system where all the user database is in ZODB or already locked up in a proprietary closed database.

There is another approach of course, which may end up being simpler, if more brittle. You could just try to keep the two databases in sync. It doesn't matter too much which is the master, you just have to decide.

Yes, but from my experience, only with great care. As you note that arrangement is brittle. I would prefer to avoid it.

...

Once the core acts on that change, it will trigger a PasswordChangeEvent which has a reference to the user effecting that change. If Postorius was the master database for passwords, we'd have to add a little event subscriber which, when it got a PasswordChangeEvent, then talked to Postorius to make that change. Or maybe it updated the shared user data component, or made the appropriate SQL UPDATE call. The key thing again, is that the core just fires the PasswordChangeEvent, and other things react to it.

Alternatively, let's say a user changes their password through the web ui. I think this case is already covered, because the way to keep that in sync with the core is to make the appropriate REST call, probably PATCHing the user's password.

I think that this is the wrong approach to the extent that "core", and here, by "core" I think you mean the "admin-by-mail" component, does anything other than call the same REST interface that the web UI would use.

The implementation of that interface item could reside anywhere in the system and it would manipulate the database through the IUserManager.

...

So for example, an IMember associates an IAddress with an IMailingList. The standard implementation of that doesn't use a foreign key between the member table and the mailinglist table. Instead it uses the fqdn_listname, i.e. a string.

I don't agree. Your string which contains the fqdn_listname IS a foreign key to the mailinglist table. It may not be the primary key in some installations, but it is one-to-one with that key and could be used directly as the primary key.

Barry Warsaw

3:07 p.m.

On Jul 12, 2012, at 12:25 AM, Richard Wackerbarth wrote:

...

On Jul 11, 2012, at 9:05 PM, Barry Warsaw wrote:

...

...
There *should* be enough hooks in the system already for a system

...
administrator to say "I want to use Postorius, so I must enable the Postorius IUserManager implementation".

And, by saying this, you imply that the storage of passwords is no longer the responsibility of "core".

You have delegated it to IUserManager. And the default Postorius compatible IUserManager could store those passwords where ever it wishes. In particular, it can store them in the Postorius User database, along with all of the other User profile information that the site wants to keep. > because everything talks to that object through the interface, and as long as that keeps its contract, the rest of the system should, again Just Work.

Note that when I say IUserManager above I'm talking specifically about the zope.interface that is used in the core. You're talking about a conceptual independent component called a "user manager". The former is a coding specification that contains only the things the core is interested in, it wouldn't be used as an inter-system definition. The latter would be specified by its REST API (let's say). Then the core would have an implementation of IUserManager which maps its data and functionality to REST calls against this external component.

...

...
I have no idea whether the above will be easy or not, since nobody's tried it. But the system design should allow you to do it this way, and I would be very open to the right hooks, fixes, and extensions to make this possible. I hope you can see how this approach lets someone run Mailman in many different configurations, from a core-only system, to Postorius, to a system where all the user database is in ZODB or already locked up in a proprietary closed database.

There is another approach of course, which may end up being simpler, if more brittle. You could just try to keep the two databases in sync. It doesn't matter too much which is the master, you just have to decide.

Yes, but from my experience, only with great care. As you note that arrangement is brittle. I would prefer to avoid it.

It definitely means data kept in two places, and you're right that great care has to be taken. What if the core is down for maintenance when a user changes her password through Postorius? These are the kinds of things that the Launchpad integration work addressed, but that was in a highly controlled environment, with lots of operational assurances. In our case, we can't just hand-wave that brittleness away.

...

I think that this is the wrong approach to the extent that "core", and here, by "core" I think you mean the "admin-by-mail" component, does anything other than call the same REST interface that the web UI would use.

It could, though internally, that would be hidden behind the implementation of IUserManager, IUser, and associated objects. We're saying the same thing but in a different way. The advantage of hiding this behind the zope.interfaces is that we can have multiple alternative implementations of those interfaces, so that whether it's REST or a SQL call doesn't matter to most of the core.

...

...
So for example, an IMember associates an IAddress with an IMailingList. The standard implementation of that doesn't use a foreign key between the member table and the mailinglist table. Instead it uses the fqdn_listname, i.e. a string.

I don't agree. Your string which contains the fqdn_listname IS a foreign key to the mailinglist table. It may not be the primary key in some installations, but it is one-to-one with that key and could be used directly as the primary key.

The point is that it's not a foreign key constraint in the members table. Doing that would, I think, prevent you from putting the mailinglist table in one database and the member table in another. It makes the associated queries less efficient, but provides for a useful measure of flexibility.

Cheers, -Barry

P.S. FWIW, I think this separate "user database" component with REST calls in the IUserManager (et al) implementation would not actually be difficult to mock up. I don't have the time to do the coding, but would be happy to discuss details if someone wanted to take a crack at it.

Richard Wackerbarth

4:39 p.m.

In many respects I think that we are in agreement.

If I understand you, we both envision a distributed database whereby any of the components might actually provide the data storage for some specific class of information. For example, without specifying where in the system it will reside, there is a "UserManager" that maintains information about the users. It would also be responsible for registering ALL user instances and providing lists of users that meet certain constraints. In addition, it provides an access mechanism for properties associated with particular users.

However, I think that we differ in terms of the public interface specifications.

The specification that you suggest produces a particular python object.

The specification that I suggest contains only those characteristics which are exposed on the REST interface. It does not specify the exact format in which those characteristics are presented, but only what characteristics are available and what operations can be performed. In this form, no component external to the UserManager is permitted to rely on any aspect of the representation which falls outside of the available public characteristics.

The place where I am unclear of your intentions concerns this python object. In theory, it may have characteristics that are not exposed to the REST interface. Do you permit your functional modules, (the message handler or the administration interface, for example) to rely on any characteristics that are not properly exposed by the REST interface. If so, why?

On Jul 12, 2012, at 10:07 AM, Barry Warsaw wrote:

...

...
I think that this is the wrong approach to the extent that "core", and here, by "core" I think you mean the "admin-by-mail" component, does anything other than call the same REST interface that the web UI would use.

It could, though internally, that would be hidden behind the implementation of IUserManager, IUser, and associated objects. We're saying the same thing but in a different way. The advantage of hiding this behind the zope.interfaces is that we can have multiple alternative implementations of those interfaces, so that whether it's REST or a SQL call doesn't matter to most of the core.

...

...
...
So for example, an IMember associates an IAddress with an IMailingList. The standard implementation of that doesn't use a foreign key between the member table and the mailinglist table. Instead it uses the fqdn_listname, i.e. a string.

I don't agree. Your string which contains the fqdn_listname IS a foreign key to the mailinglist table. It may not be the primary key in some installations, but it is one-to-one with that key and could be used directly as the primary key.

The point is that it's not a foreign key constraint in the members table.

OK, I think we are in agreement WRT this concept. You seem to have taken "foreign key" to imply a specific implementation of the concept to which I was referring, namely some key which designates a unique instance in the referenced table.

...

Doing that would, I think, prevent you from putting the mailinglist table in one database and the member table in another. It makes the associated queries less efficient, but provides for a useful measure of flexibility.

Were this not Python, but C++, I would suggest that we consider offering multiple signatures for the exposed functionality. For example, it the function were to deliver a list of MLs to which an Address is subscribed, the address might be provided as a string (the email address in canonical form), the pk of the entry in the underlying Address database, or as an object of the Address class.

Although the same could be done in python through introspection, or similar techniques, I don't know to what extent doing so would introduce inefficiencies.

...

P.S. FWIW, I think this separate "user database" component with REST calls in the IUserManager (et al) implementation would not actually be difficult to mock up. I don't have the time to do the coding, but would be happy to discuss details if someone wanted to take a crack at it.

I definitely will take a crack at it since I suspect that doing so would allow Postorius to have all of its user information handled in one place and still provide the "admin by mail" to work with its restricted expectations.

Barry Warsaw

9:38 p.m.

On Jul 12, 2012, at 11:39 AM, Richard Wackerbarth wrote:

...

If I understand you, we both envision a distributed database whereby any of the components might actually provide the data storage for some specific class of information. For example, without specifying where in the system it will reside, there is a "UserManager" that maintains information about the users. It would also be responsible for registering ALL user instances and providing lists of users that meet certain constraints. In addition, it provides an access mechanism for properties associated with particular users.

However, I think that we differ in terms of the public interface specifications.

Actually, I wouldn't say that IUserManager is a public interface specification. It's an internal implementation specification meaning, it's exactly the API - no more, no less - that the core requires from the "user manager service".

...

The specification that you suggest produces a particular python object.

The specification that I suggest contains only those characteristics which are exposed on the REST interface. It does not specify the exact format in which those characteristics are presented, but only what characteristics are available and what operations can be performed. In this form, no component external to the UserManager is permitted to rely on any aspect of the representation which falls outside of the available public characteristics.

Sure, and if this independent user manager service existed, then the IUserManager implementation inside the core would have to map the attributes and actions it requires to this external service via REST calls.

...

The place where I am unclear of your intentions concerns this python object. In theory, it may have characteristics that are not exposed to the REST interface. Do you permit your functional modules, (the message handler or the administration interface, for example) to rely on any characteristics that are not properly exposed by the REST interface. If so, why?

The core's current REST API has grown organically, and has been filled out only to the extent that it provides information other services (like Postorius) have needed. We've never yet needed, let alone figured out, what the REST API for the "user manager service" would look like. I guess that's what we're doing here in this thread! :)

Some of the IUserManager API has been implicitly exposed on the REST API, e.g. creating users, searching for users, and accessing properties of those individual users. But the IUserManager API hasn't yet been fully necessary.

I don't see any reason why it couldn't be mapped to a REST API; it's just a lack of need, and nobody's written it yet. The question is whether IUserManager is actually the right API for a REST interface - in general I don't think you need a 1-to-1 mapping of internal model or APIs to REST.

What would a specification of a RESTful user manager look like?

...

OK, I think we are in agreement WRT this concept. You seem to have taken "foreign key" to imply a specific implementation of the concept to which I was referring, namely some key which designates a unique instance in the referenced table.

Yep, because I was talking about the implementation of the relationships in the core's current table layout.

...

...
Doing that would, I think, prevent you from putting the mailinglist table in one database and the member table in another. It makes the associated queries less efficient, but provides for a useful measure of flexibility.

Were this not Python, but C++, I would suggest that we consider offering multiple signatures for the exposed functionality. For example, it the function were to deliver a list of MLs to which an Address is subscribed, the address might be provided as a string (the email address in canonical form), the pk of the entry in the underlying Address database, or as an object of the Address class.

What I described above though is at the database/sql layer, below any language mappings. If Mailman's database maintained the mailinglist table, but the user table lived in some external Postgres databases, you couldn't use foreign key constraints to express the relationship of those two in the member table. We've just planned ahead and made the relationship between members and the mailing lists based on the fqdn_listname of the mailing list[*].

...

Although the same could be done in python through introspection, or similar techniques, I don't know to what extent doing so would introduce inefficiencies.

...
P.S. FWIW, I think this separate "user database" component with REST calls in the IUserManager (et al) implementation would not actually be difficult to mock up. I don't have the time to do the coding, but would be happy to discuss details if someone wanted to take a crack at it.

I definitely will take a crack at it since I suspect that doing so would allow Postorius to have all of its user information handled in one place and still provide the "admin by mail" to work with its restricted expectations.

Excellent! What I'd recommend is starting by looking at the following zope.interfaces in the core, and thinking about how the "user manager service" (or whatever it's called[**]) could provide the necessary functionality. Of course this service will probably be a superset of what the core needs, which is fine, since the core will just ignore the parts of the REST resource tree it doesn't care about. This service also doesn't have to directly express the zope.interfaces, just that we have to be able to implement the methods and properties defined in these interfaces using REST calls.

Probably a good start would be:

IAddress
IMember
IRegistrar
IRoster
ISubscriptionService
IUser
IUserManager

Cheers, -Barry

[*] Of course, now that I think about it, this will probably cause us problems later when we support list renaming. The relationship should probably be between the List-ID and the user, since the former won't change even if the mailing list is renamed. Sigh.

[**] Hey Florian, continuing on a theme, maybe "Vicious". Get it? :)

Stephen J. Turnbull

13 Jul 13 Jul

2:57 a.m.

Barry Warsaw writes:

...

I don't see any reason why it couldn't be mapped to a REST API; it's just a lack of need, and nobody's written it yet. The question is whether IUserManager is actually the right API for a REST interface - in general I don't think you need a 1-to-1 mapping of internal model or APIs to REST.

My suspicion is that in particular (ie, when addressing a specific set of requirements) you don't need a 1-1 map of internal APIs to REST. But if you want World Domination (and you have expressed that ambition on occasion), you need a system that grows new complex functionality organically. That, I think, means that we want various parts of the system to cooperatively build a single database that is offered to all components.

If you want to say, OK, I'm not really serious about that, that's fine by me: we *need* a killer MLM, and Mailman 2 isn't it (any more). But if you want to try for World Domination, I think the database (ie, the "social network engine") is the one component that's going to make or break that enterprise.

...

We've just planned ahead and made the relationship between members and the mailing lists based on the fqdn_listname of the mailing list[*].

This should be the List-Id. RFC 2919 essentially says that the mapping of "mailing lists" (whatever *they* "really" are) to List-Ids is 1-1 by definition. I think that in practice this is probably going to work well, in the sense that people who thoughtlessly initialize a new List-Id when they migrate domains probably don't think (much) about the fact that it's a new list, while people who take the care to keep the same List-Id surely do care about the list's identity.

...

zope.interfaces, just that we have to be able to implement the methods and properties defined in these interfaces using REST calls.

Probably a good start would be:

IAddress

IMember

IRegistrar

IRoster

ISubscriptionService

IUser

IUserManager

This list kind of worries me -- it's very specific and fragmented. Is there a higher-level way of doing this mapping?

...

[**] Hey Florian, continuing on a theme, maybe "Vicious". Get it? :)

I see ....

Barry Warsaw

6:37 p.m.

On Jul 13, 2012, at 11:57 AM, Stephen J. Turnbull wrote:

...

Barry Warsaw writes:

...
I don't see any reason why it couldn't be mapped to a REST API; it's just a lack of need, and nobody's written it yet. The question is whether IUserManager is actually the right API for a REST interface - in general I don't think you need a 1-to-1 mapping of internal model or APIs to REST.

My suspicion is that in particular (ie, when addressing a specific set of requirements) you don't need a 1-1 map of internal APIs to REST.

I agree.

...

But if you want World Domination (and you have expressed that ambition on occasion), you need a system that grows new complex functionality organically. That, I think, means that we want various parts of the system to cooperatively build a single database that is offered to all components.

If you want to say, OK, I'm not really serious about that, that's fine by me: we *need* a killer MLM, and Mailman 2 isn't it (any more). But if you want to try for World Domination, I think the database (ie, the "social network engine") is the one component that's going to make or break that enterprise.

I don't necessarily disagree with any of that, but that doesn't necessarily mean that the API exposed by the core needs to provide that database. OTOH, a separate user database component certainly could expose that stuff, and where the two systems overlap in their support for the uber-model, certainly the resource locations and semantics should be as close as possible.

As an example, the core currently exposes just a few attributes at the user resource in REST: user_id, created_on, password (optional), and display_name (optional). These live at <baseurl>/users/<userid>. If the separate user database also kept track of things like Facebook and Twitter ids, these would also be available at the same location (different base-url of course) and the returned JSON would just contain additional entries for that extra data.

...

...
We've just planned ahead and made the relationship between members and the mailing lists based on the fqdn_listname of the mailing list[*].

This should be the List-Id. RFC 2919 essentially says that the mapping of "mailing lists" (whatever *they* "really" are) to List-Ids is 1-1 by definition. I think that in practice this is probably going to work well, in the sense that people who thoughtlessly initialize a new List-Id when they migrate domains probably don't think (much) about the fact that it's a new list, while people who take the care to keep the same List-Id surely do care about the list's identity.

LP: #1024509

...

...
zope.interfaces, just that we have to be able to implement the methods and properties defined in these interfaces using REST calls.

Probably a good start would be:

IAddress

IMember

IRegistrar

IRoster

ISubscriptionService

IUser

IUserManager

This list kind of worries me -- it's very specific and fragmented. Is there a higher-level way of doing this mapping?

There's not unfortunately.

...

...
[**] Hey Florian, continuing on a theme, maybe "Vicious". Get it? :)

I see ....

I think we'll have to start calling the core engine "Lee", since as we all know, it should be the center of attention. :)

-Barry

4496

Age (days ago)

4505

Last active (days ago)

List overview

Download

50 comments

5 participants

participants (5)

Barry Warsaw
Florian Fuchs
Richard Wackerbarth
Stephen J. Turnbull
Terri Oda

Login / User Identification Issues in MM3

tags

participants (5)