Hi from a student interested in a GSoC project
Hi,
My name is Elias Assarsson and I am interested in doing a GSoC project for Mailman. The idea I found most interesting is Scripts for migrating from Mailman 2.1 to Mailman 3 http://wiki.list.org/display/DEV/Google+Summer+of+Code+2013#GoogleSummerofCo... . Where can I more information about this idea?
I do not know if it is a good idea but I am thinking that using Augeas http://augeas.net might be a way to handle the migration of configuration from one format to another. For instance I found the following https://www.redhat.com/archives/augeas-devel/2012-January/msg00050.html when researching the idea. I have no experience with Augeas but this might be a good opportunity to learn it. I do however have experience with Python (and other scripting languages such as Ruby). Any input or help in developing this idea would be welcome.
Some things about me. I am a GNU/Linux user for almost ten year and a student in computer science courses in Sweden. If you want to know more, please ask.
Regards, Elias Assarsson
In addition to Augeas, I would encourage you to look at Config::Model. I think you will need the pair of them to do upgrades. There are a number of blog posts on Planet Debian about it and I'm sure the author would be willing to help out.
http://planet-search.debian.org/cgi-bin/search.cgi?terms=Config::Model
-- bye, pabs
On Apr 10, 2013, at 11:05 AM, Elias Assarsson wrote:
I do not know if it is a good idea but I am thinking that using Augeas http://augeas.net might be a way to handle the migration of configuration from one format to another.
Definitely investigate these tools, although my suspicion is that they won't help, or at least won't help enough to make accepting a new dependency worth it.
There are a few problems to consider:
MM2's configuration file is a Python file which really must be imported in order to get a valid set of values. MM3's configuration file is a stack of .ini-style files.
There is not always a 1-to-1 correspondence between MM2 values and MM3 values. Some configurations have been merged, some removed, etc. You will pretty much have to go through each set of MM2 variables and decide if and how to transform them into something meaningful for MM3.
The configuration files are only for system-wide settings. We really also want to be able to upgrade MM2 lists to MM3 lists, and that involves unpickling the config.db state and again, mapping the MM2 variables to MM3 variables, which are stored in a database.
What about importing archives?
There is some moderate beginnings in the 3.0 tree, but none of it is functional in all likelihood. Take a look at src/mailman/bin/export.py and src/mailman/commands/cli_import.py.
Cheers, -Barry
Thanks for an informative answer!
Definitely investigate these tools, although my suspicion is that they won't help, or at least won't help enough to make accepting a new dependency worth it.
There are a few problems to consider:
MM2's configuration file is a Python file which really must be imported in order to get a valid set of values. MM3's configuration file is a stack of .ini-style files. I am trying to find and understand the configuration files so that I know what that that needs to be migrated and to what form. Is the MM2 configuration you refer to mainly Mailman/mm_cfg.py and MM3 configuration files src/mailman/config/* and /src/mailman/<listname>/*?
There is not always a 1-to-1 correspondence between MM2 values and MM3 values. Some configurations have been merged, some removed, etc. You will pretty much have to go through each set of MM2 variables and decide if and how to transform them into something meaningful for MM3.
The configuration files are only for system-wide settings. We really also want to be able to upgrade MM2 lists to MM3 lists, and that involves unpickling the config.db state and again, mapping the MM2 variables to MM3 variables, which are stored in a database. Given the two points above it seems that handling the migration from within Python is the best choice (rather than using Augeas which is C
- What about importing archives? I have tried to find information on how archives are stored in MM2 and MM3 but failed to find any. What is a good source to learn about this? There is some moderate beginnings in the 3.0 tree, but none of it is functional in all likelihood. Take a look at src/mailman/bin/export.py and src/mailman/commands/cli_import.py. So those are the files that are supposed to handle migration for which
2013-04-10 20:14, Barry Warsaw skrev: based or Config::Model which is Perl based). Obviously one wants to use whatever feature of Python that can ease the process. You seem to be using configparser in MM3. I dunno if there are any other Python tool one should look at in helping migration. Maybe one should investigate this further. the project is to make them handle a complete migration from MM2 to MM3?
On another note I have been able to setup a web UI although it took far longer than 5 minutes as I struggled with problems due to having both Python 2.7 and 3.2 on my system.
Elias Assarsson writes:
On another note I have been able to setup a web UI although it took far longer than 5 minutes as I struggled with problems due to having both Python 2.7 and 3.2 on my system.
Did you try a virtualenv? That usually helps with such problems.
On Apr 11, 2013, at 02:22 PM, Elias Assarsson wrote:
- MM2's configuration file is a Python file which really must be imported in order to get a valid set of values. MM3's configuration file is a stack of .ini-style files.
I am trying to find and understand the configuration files so that I know what that that needs to be migrated and to what form. Is the MM2 configuration you refer to mainly Mailman/mm_cfg.py and MM3 configuration files src/mailman/config/* and /src/mailman/<listname>/*?
Yes, in a deployed Mailman 2, mm_cfg.py will contain the system configuration
settings. These override the settings from Defaults.py so a good way to
explore is to use bin/withlist
and import mm_cfg
at the Python prompt.
In MM3, we use lazr.config, which is essentially a configparser type package that allows for stacks of configurations, with pushing and popping values on this stack.
http://pythonhosted.org/lazr.config/
The src/mailman/config/schema.cfg file is at the bottom of the stack and defines the schema, obviously :). From there, src/mailman/config/mailman.cfg essentially instantiates this schema, providing all the defaults. In the testing environment, src/mailman/testing/testing.cfg gets pushed on top of that (and many tests push and pop micro-overrides). In a deployed system, the site's mailman.cfg is on the top of the stack and so can override anything. There are various places this mailman.cfg file is searched; see src/mailman/core/initialize.py for all the gory details.
List-specific configurations live in the various config.db files for MM2. These are pickles. In MM3, everything's in the database.
Given the two points above it seems that handling the migration from within Python is the best choice (rather than using Augeas which is C based or Config::Model which is Perl based).
I think so. Pickles in particular are Python-specific. You could generate an intermediate format, but I'm not sure it's worth it.
Obviously one wants to use whatever feature of Python that can ease the process. You seem to be using configparser in MM3. I dunno if there are any other Python tool one should look at in helping migration. Maybe one should investigate this further.
- What about importing archives?
I have tried to find information on how archives are stored in MM2 and MM3 but failed to find any. What is a good source to learn about this?
Importing archives will either be trivial or impossible :). At the lowest level, parsing the MM2 .mbox file and generating maildirs would work, but there are existing tools to do that, so probably little for GSoC to worry about. Much more interesting would be to parse the Pipermail database files and try to build a reverse mapping from URL to Message-ID so that these could be preserved when the archive is regenerated for HyperKitty or whatever.
There is some moderate beginnings in the 3.0 tree, but none of it is functional in all likelihood. Take a look at src/mailman/bin/export.py and src/mailman/commands/cli_import.py.
So those are the files that are supposed to handle migration for which the project is to make them handle a complete migration from MM2 to MM3?
Well, I'd say they were more some unfinished work in that direction.
-Barry
I encourage any GSoC candidates to actively discuss design issues on this list. Many aspects of MM3 remain only partially defined and still require design in addition to the coding that will follow. Although some might expect the mentors might "spoon feed" coding tasks, as a mentor, I would prefer to work with someone who is actively involved in the design as well as the implementation.
Having looked at MM2->MM3 migration in the past (and deferred implementation because critical infrastructure was not available at that time) let me present a different perspective.
First, there are some parameters that do not relate to any particular mailing list. IMHO, these aspects need not even be addressed in a conversion. If I wish to set up a MM3 installation, I should first, manually, set up a sample list. After I do so, the configuration of any "real" lists needs to be COMPLETELY configurable using the REST interface. If that interface is not presently adequate, it needs to be revised rather than "working around" any deficiency.
Therefore, I should be able to set up my "sample list" and, thereafter, add/edit all of the real lists utilizing the same interface (using mailman.cliient, etc.) that is available to the Postorius interface for list creation/editing. A migration tool would, therefore, need only "simulate" those manual steps that the installer would execute on a web-based interface to create a new list and adjust its settings.
Similarly, handling the subscriptions must be something that can be done using just the access exposed via the REST interface.
The big distinction between MM2 and MM3 lies in the conceptual model of entities. In MM2, each subscription is a separate entity. In MM3, subscriptions belong to "persons" and management functions are made available for the person to affect multiple subscriptions at the same time.
In translating from MM2 to MM3, the aggregation of subscriptions under a common "person" becomes a non-trivial task. However the mechanisms required to handle reassignment are needed within MM3 implementations because there is an alternate access mechanism (admin by email) which cannot directly identify these "persons".
Therefore, I would suggest that a migration be broken into some components,
- Migrate individual list parameters
- Aggregate groups of lists
- Migrate individual subscriptions
- Aggregate subscriptions by "person".
Note that the aggregation functions for both lists and persons require similar mechanisms and that having the ability to edit those configurations within Postorius will be beneficial to both migration and routine system operation.
Richard
On Apr 11, 2013, at 3:11 PM, Barry Warsaw <barry@list.org> wrote:
On Apr 11, 2013, at 02:22 PM, Elias Assarsson wrote:
- MM2's configuration file is a Python file which really must be imported in order to get a valid set of values. MM3's configuration file is a stack of .ini-style files.
I am trying to find and understand the configuration files so that I know what that that needs to be migrated and to what form. Is the MM2 configuration you refer to mainly Mailman/mm_cfg.py and MM3 configuration files src/mailman/config/* and /src/mailman/<listname>/*?
Yes, in a deployed Mailman 2, mm_cfg.py will contain the system configuration settings. These override the settings from Defaults.py so a good way to explore is to use
bin/withlist
andimport mm_cfg
at the Python prompt.In MM3, we use lazr.config, which is essentially a configparser type package that allows for stacks of configurations, with pushing and popping values on this stack.
http://pythonhosted.org/lazr.config/
The src/mailman/config/schema.cfg file is at the bottom of the stack and defines the schema, obviously :). From there, src/mailman/config/mailman.cfg essentially instantiates this schema, providing all the defaults. In the testing environment, src/mailman/testing/testing.cfg gets pushed on top of that (and many tests push and pop micro-overrides). In a deployed system, the site's mailman.cfg is on the top of the stack and so can override anything. There are various places this mailman.cfg file is searched; see src/mailman/core/initialize.py for all the gory details.
List-specific configurations live in the various config.db files for MM2. These are pickles. In MM3, everything's in the database.
On Apr 11, 2013, at 07:02 PM, Richard Wackerbarth wrote:
I encourage any GSoC candidates to actively discuss design issues on this list. Many aspects of MM3 remain only partially defined and still require design in addition to the coding that will follow. Although some might expect the mentors might "spoon feed" coding tasks, as a mentor, I would prefer to work with someone who is actively involved in the design as well as the implementation.
Totally agree.
First, there are some parameters that do not relate to any particular mailing list. IMHO, these aspects need not even be addressed in a conversion. If I wish to set up a MM3 installation, I should first, manually, set up a sample list. After I do so, the configuration of any "real" lists needs to be COMPLETELY configurable using the REST interface. If that interface is not presently adequate, it needs to be revised rather than "working around" any deficiency.
The REST API is pretty easy to extend. By far the most time consuming bit (assuming you know where you want your resources to live) is writing docs and tests. For many of the GSoC tasks, extending the REST API should be considered part of the work. This of course includes both in the core and in the mailman-client wrapper.
Therefore, I should be able to set up my "sample list" and, thereafter, add/edit all of the real lists utilizing the same interface (using mailman.cliient, etc.) that is available to the Postorius interface for list creation/editing. A migration tool would, therefore, need only "simulate" those manual steps that the installer would execute on a web-based interface to create a new list and adjust its settings.
Perhaps. I think migrations could be done this way, but it's probably more efficient to do it against the internal API.
Similarly, handling the subscriptions must be something that can be done using just the access exposed via the REST interface.
The big distinction between MM2 and MM3 lies in the conceptual model of entities. In MM2, each subscription is a separate entity. In MM3, subscriptions belong to "persons" and management functions are made available for the person to affect multiple subscriptions at the same time.
In translating from MM2 to MM3, the aggregation of subscriptions under a common "person" becomes a non-trivial task. However the mechanisms required to handle reassignment are needed within MM3 implementations because there is an alternate access mechanism (admin by email) which cannot directly identify these "persons".
Absolutely right. It's an open question as to how to merge user records. Because the user "databases" in a multi-list MM2 site are silos, there's no connection between my settings in listA and my settings in listB. How would you combine those into a single user record for MM3?
Additionally, in MM2 you don't now that me1@example.com and me2@example.com are both owned by me. While it's probably impossible to merge these two records when migrating them to MM3, we would like to have *some* way to merge the two records once they're migrated to MM3. We'd need to work out the workflow for that, including any security guarantees, so that a user could merge those records after a migration.
Therefore, I would suggest that a migration be broken into some components,
- Migrate individual list parameters
- Aggregate groups of lists
- Migrate individual subscriptions
- Aggregate subscriptions by "person".
-Barry
Richard Wackerbarth writes:
Therefore, I would suggest that a migration be broken into some components,
- Migrate individual list parameters
- Aggregate groups of lists
- Migrate individual subscriptions
- Aggregate subscriptions by "person".
+1, and perhaps some of these are big/error-prone enough to constitute whole GSoC projects themselves. (I don't say that they are, but we should think about it during design of an intern's project.)
I appreciate the help in trying to understand the configuration systems in MM2 and MM3.
2013-04-11 22:11, Barry Warsaw skrev:
On Apr 11, 2013, at 02:22 PM, Elias Assarsson wrote:
- MM2's configuration file is a Python file which really must be imported in order to get a valid set of values. MM3's configuration file is a stack of .ini-style files. I am trying to find and understand the configuration files so that I know what that that needs to be migrated and to what form. Is the MM2 configuration you refer to mainly Mailman/mm_cfg.py and MM3 configuration files src/mailman/config/* and /src/mailman/<listname>/*? Yes, in a deployed Mailman 2, mm_cfg.py will contain the system configuration settings. These override the settings from Defaults.py so a good way to explore is to use
bin/withlist
andimport mm_cfg
at the Python prompt. I have had a look through bin/withlist, mm_cfg.py and Defaults.py to get a feel for the format of the configuration. Why is bin/withlist relevant for configuration migration? In that it is way to learn about configuration?In MM3, we use lazr.config, which is essentially a configparser type package that allows for stacks of configurations, with pushing and popping values on this stack.
http://pythonhosted.org/lazr.config/
The src/mailman/config/schema.cfg file is at the bottom of the stack and defines the schema, obviously :). From there, src/mailman/config/mailman.cfg essentially instantiates this schema, providing all the defaults. In the testing environment, src/mailman/testing/testing.cfg gets pushed on top of that (and many tests push and pop micro-overrides). In a deployed system, the site's mailman.cfg is on the top of the stack and so can override anything. There are various places this mailman.cfg file is searched; see src/mailman/core/initialize.py for all the gory details. I have had a quick look at the lazr.config documentation and also checked out mailman.cfg, testing.cfg and initialize.cfg in trying to understand the system.
I guess configuration migration scripts should have tests, e.g. to test if some particular MM2 configuration is migrated to the expected MM3 form. If this is so, what would be an appropriate way to collect a MM2 configuration and the expected MM3 form?
- Elias
On Apr 15, 2013, at 12:04 PM, Elias Assarsson wrote:
I have had a look through bin/withlist, mm_cfg.py and Defaults.py to get a feel for the format of the configuration. Why is bin/withlist relevant for configuration migration? In that it is way to learn about configuration?
Yes.
I guess configuration migration scripts should have tests, e.g. to test if some particular MM2 configuration is migrated to the expected MM3 form. If this is so, what would be an appropriate way to collect a MM2 configuration and the expected MM3 form?
Yes, *all* new code should have tests. :)
You probably should build a MM2 system, work through the site defaults and create a few mailing lists. Also build a MM3 system and see what you'd need to do to convert your MM2 lists to MM3 lists.
-Barry
participants (5)
-
Barry Warsaw
-
Elias Assarsson
-
Paul Wise
-
Richard Wackerbarth
-
Stephen J. Turnbull