Bug 971013 - schema migrations BLOCKED

I've been working on and off for quite some time on bug 971013
https://bugs.launchpad.net/mailman/+bug/971013
This bug is about providing orderly schema migrations. I now admit that I'm blocked and I'm looking for suggestions.
As you know, I made a promise that after beta 1, we'd only change the schema if we can provide automatic migrations. Or IOW, I won't make incompatible schema changes that break existing mm3 installations.
We had some discussion on this list a while ago about mechanisms to do this, and trunk has some rudimentary support for it. In the meantime, I found out about a nice tool called Alembic
http://alembic.readthedocs.org/en/latest/index.html
which provides a better framework for doing migrations. Even though it's primarily geared toward SQLAlchemy, I've had some off-line discussions with Alembic's author Michael Bayer about how we could utilize Alembic without pulling in all of SQLAlchemy[1]. I experimented with a branch and it looked promising, but that's not what's blocking progress on this.
What's blocking progress is SQLite, as described in this FAQ:
http://sqlite.org/faq.html#q11
tl;dr: SQLite has a very limited ALTER TABLE command, and specifically does
not support column renaming or deleting. This really sucks for implementing
any kind of schema migration, regardless of the framework. The little example
shown in the FAQ doesn't really scale well when you have lots of relationships
between the tables, as is the case for the mailinglist
table as currently
defined. Here's where I am blocked.
There are a couple of options, none of which I like very much.
I could break the beta1 schema migration promise. I'd just make the schema changes and all your existing systems would break (all three of them <wink>). As unfortunate as that is, going this route is really a cop out because as soon as 3.0 final is out, we can almost guarantee there will be changes in 3.1 so we'll have to face this issue eventually.
We could drop support for SQLite. The other database we (semi-)officially support is Postgres (still waiting on a MySQL branch), which has a sufficiently powerful ALTER TABLE for our needs. Some people would claim that SQLite isn't really a viable database anyway, since it's prone to deadlocks in multithreaded/processing applications such as mm3. While true, especially for large, busy sites, I think SQLite can work for smaller sites, and it does make installation much easier since SQLite support comes with Python. It certainly makes testing much easier.
As has been suggested before, we could convert the mailinglist table to key/value pairs, which is actually how the pendings table works. This might be a good idea *anyway* to better support extensions, but of course I'd have to break the beta1 promise (just this once <wink>) to make this change, and it wouldn't help for any other table. I think it would also make queries trickier, and heck if we go that route why even use a relational database?
We could try to only add columns in SQLite and try to smack the migration scripts into only copying data from the old columns to the new ones. This might actually be the most feasible option, but it does leave column detritus in the SQLite database. Who cares though, right? There may be other problems with this approach I'm not yet aware of.
I'm sure there are things I've overlooked, so I welcome your comments and feedback. I'd love to break the logjam on this bug so that we can get moving again toward the 3.0 final release.
Cheers, -Barry
FTR, the two existing WIP branches:
https://code.launchpad.net/~barry/mailman/bug-971013 https://code.launchpad.net/~barry/mailman/alembic
[1] Though, if Storm never gets ported to Python 3, I might just re-evaluate the decision to use it as the ORM after mm3.0. SQLAlchemy supports Python 3.

It would be slow and consume disk, but couldn´t you copy all the data to a temporary table, delete the entire sqlite table with the old column names, create a new table with the correct column names, copy the data back, and delete the temporary table? Not like you're doing this every hour.
David Brown, dave@aasv.org (via mobile)
Barry Warsaw <barry@list.org> wrote:
I've been working on and off for quite some time on bug 971013
https://bugs.launchpad.net/mailman/+bug/971013
This bug is about providing orderly schema migrations. I now admit that I'm blocked and I'm looking for suggestions.
As you know, I made a promise that after beta 1, we'd only change the schema if we can provide automatic migrations. Or IOW, I won't make incompatible schema changes that break existing mm3 installations.
We had some discussion on this list a while ago about mechanisms to do this, and trunk has some rudimentary support for it. In the meantime, I found out about a nice tool called Alembic
http://alembic.readthedocs.org/en/latest/index.html
which provides a better framework for doing migrations. Even though it's primarily geared toward SQLAlchemy, I've had some off-line discussions with Alembic's author Michael Bayer about how we could utilize Alembic without pulling in all of SQLAlchemy[1]. I experimented with a branch and it looked promising, but that's not what's blocking progress on this.
What's blocking progress is SQLite, as described in this FAQ:
http://sqlite.org/faq.html#q11
tl;dr: SQLite has a very limited ALTER TABLE command, and specifically does
not support column renaming or deleting. This really sucks for implementing
any kind of schema migration, regardless of the framework. The little example
shown in the FAQ doesn't really scale well when you have lots of relationships
between the tables, as is the case for the mailinglist
table as currently
defined. Here's where I am blocked.
There are a couple of options, none of which I like very much.
I could break the beta1 schema migration promise. I'd just make the schema changes and all your existing systems would break (all three of them <wink>). As unfortunate as that is, going this route is really a cop out because as soon as 3.0 final is out, we can almost guarantee there will be changes in 3.1 so we'll have to face this issue eventually.
We could drop support for SQLite. The other database we (semi-)officially support is Postgres (still waiting on a MySQL branch), which has a sufficiently powerful ALTER TABLE for our needs. Some people would claim that SQLite isn't really a viable database anyway, since it's prone to deadlocks in multithreaded/processing applications such as mm3. While true, especially for large, busy sites, I think SQLite can work for smaller sites, and it does make installation much easier since SQLite support comes with Python. It certainly makes testing much easier.
As has been suggested before, we could convert the mailinglist table to key/value pairs, which is actually how the pendings table works. This might be a good idea *anyway* to better support extensions, but of course I'd have to break the beta1 promise (just this once <wink>) to make this change, and it wouldn't help for any other table. I think it would also make queries trickier, and heck if we go that route why even use a relational database?
We could try to only add columns in SQLite and try to smack the migration scripts into only copying data from the old columns to the new ones. This might actually be the most feasible option, but it does leave column detritus in the SQLite database. Who cares though, right? There may be other problems with this approach I'm not yet aware of.
I'm sure there are things I've overlooked, so I welcome your comments and feedback. I'd love to break the logjam on this bug so that we can get moving again toward the 3.0 final release.
Cheers, -Barry
FTR, the two existing WIP branches:
https://code.launchpad.net/~barry/mailman/bug-971013 https://code.launchpad.net/~barry/mailman/alembic
[1] Though, if Storm never gets ported to Python 3, I might just re-evaluate the decision to use it as the ORM after mm3.0. SQLAlchemy supports Python 3.
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/dave%40aasv.org
Security Policy: http://wiki.list.org/x/QIA9

Barry Warsaw writes:
So it's the renaming that causes problems, and the issue with renaming (ie, why SQLite doesn't support it) is that not only do you change the name of the column, but you need to fix up all references in relations, is that it?
I still don't see why this is conceptually hard. Since you're working with schema changes, you have a list of renamings oldname -> newname, and row data in the tables whose values do not depend on variable names. Then you create a new database (ie, empty tables) with the new schema, you extend the list of renames with trivial renames (oldname -> oldname), and you copy the rows in each table from (*oldnames) -> (*newnames). What am I missing?
Is there an implicit requirement that this be done quickly? in minimum space? in place? Aren't all of those YAGNIs (as David points out more briefly)? Except of course for you, since you are testing the migrations. ;-) In fact, I would be tempted to say "keep the old database around, in case you need to reverse migrate" (and for you, it would allow post-mortems and in general cross-checking of the correctness of migrations since you have both old and new databases).
Two problems I can foresee. First, name collisions: it's quite possible you'll add a new column with somewhat different semantics (and therefore different values) and want to give it the name attached to a previously deleted column. Easy to workaround if you're doing things by hand, but an enthusiasm-sapper for you (wearing your DB developer hat) because I suspect the only thing an auto-migrate script can do is fatal here.
Second, if the deleted column is involved in relationships, these relationships will go unmaintained in the new schema. I don't know if this would result in an actual bug, but it's unclean.

On Jul 06, 2012, at 10:00 PM, Barry Warsaw wrote:
This bug is about providing orderly schema migrations. I now admit that I'm blocked and I'm looking for suggestions.
I actually think I've finally cracked this nut. I have it working for SQLite, and am now testing for Postgres. When I get that working and committed to trunk, it would be great for others to try it on your test lists. You shouldn't have to do anything but stop, install the new version, and restart.
Looking forward to making progress again. -Barry

On Jul 21, 2012, at 06:17 PM, Barry Warsaw wrote:
This has just landed in lp:mailman and I have it working locally for both SQLite and PostgreSQL.
If you have beta1 deployed, please do give the upgrade a try. Make backups first though! :) If you encounter bugs, please file them in Launchpad, hopefully with a way to reproduce the problem, or at least let me know whether you're using SQLite or PostgreSQL.
Cheers, -Barry

It would be slow and consume disk, but couldn´t you copy all the data to a temporary table, delete the entire sqlite table with the old column names, create a new table with the correct column names, copy the data back, and delete the temporary table? Not like you're doing this every hour.
David Brown, dave@aasv.org (via mobile)
Barry Warsaw <barry@list.org> wrote:
I've been working on and off for quite some time on bug 971013
https://bugs.launchpad.net/mailman/+bug/971013
This bug is about providing orderly schema migrations. I now admit that I'm blocked and I'm looking for suggestions.
As you know, I made a promise that after beta 1, we'd only change the schema if we can provide automatic migrations. Or IOW, I won't make incompatible schema changes that break existing mm3 installations.
We had some discussion on this list a while ago about mechanisms to do this, and trunk has some rudimentary support for it. In the meantime, I found out about a nice tool called Alembic
http://alembic.readthedocs.org/en/latest/index.html
which provides a better framework for doing migrations. Even though it's primarily geared toward SQLAlchemy, I've had some off-line discussions with Alembic's author Michael Bayer about how we could utilize Alembic without pulling in all of SQLAlchemy[1]. I experimented with a branch and it looked promising, but that's not what's blocking progress on this.
What's blocking progress is SQLite, as described in this FAQ:
http://sqlite.org/faq.html#q11
tl;dr: SQLite has a very limited ALTER TABLE command, and specifically does
not support column renaming or deleting. This really sucks for implementing
any kind of schema migration, regardless of the framework. The little example
shown in the FAQ doesn't really scale well when you have lots of relationships
between the tables, as is the case for the mailinglist
table as currently
defined. Here's where I am blocked.
There are a couple of options, none of which I like very much.
I could break the beta1 schema migration promise. I'd just make the schema changes and all your existing systems would break (all three of them <wink>). As unfortunate as that is, going this route is really a cop out because as soon as 3.0 final is out, we can almost guarantee there will be changes in 3.1 so we'll have to face this issue eventually.
We could drop support for SQLite. The other database we (semi-)officially support is Postgres (still waiting on a MySQL branch), which has a sufficiently powerful ALTER TABLE for our needs. Some people would claim that SQLite isn't really a viable database anyway, since it's prone to deadlocks in multithreaded/processing applications such as mm3. While true, especially for large, busy sites, I think SQLite can work for smaller sites, and it does make installation much easier since SQLite support comes with Python. It certainly makes testing much easier.
As has been suggested before, we could convert the mailinglist table to key/value pairs, which is actually how the pendings table works. This might be a good idea *anyway* to better support extensions, but of course I'd have to break the beta1 promise (just this once <wink>) to make this change, and it wouldn't help for any other table. I think it would also make queries trickier, and heck if we go that route why even use a relational database?
We could try to only add columns in SQLite and try to smack the migration scripts into only copying data from the old columns to the new ones. This might actually be the most feasible option, but it does leave column detritus in the SQLite database. Who cares though, right? There may be other problems with this approach I'm not yet aware of.
I'm sure there are things I've overlooked, so I welcome your comments and feedback. I'd love to break the logjam on this bug so that we can get moving again toward the 3.0 final release.
Cheers, -Barry
FTR, the two existing WIP branches:
https://code.launchpad.net/~barry/mailman/bug-971013 https://code.launchpad.net/~barry/mailman/alembic
[1] Though, if Storm never gets ported to Python 3, I might just re-evaluate the decision to use it as the ORM after mm3.0. SQLAlchemy supports Python 3.
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/dave%40aasv.org
Security Policy: http://wiki.list.org/x/QIA9

Barry Warsaw writes:
So it's the renaming that causes problems, and the issue with renaming (ie, why SQLite doesn't support it) is that not only do you change the name of the column, but you need to fix up all references in relations, is that it?
I still don't see why this is conceptually hard. Since you're working with schema changes, you have a list of renamings oldname -> newname, and row data in the tables whose values do not depend on variable names. Then you create a new database (ie, empty tables) with the new schema, you extend the list of renames with trivial renames (oldname -> oldname), and you copy the rows in each table from (*oldnames) -> (*newnames). What am I missing?
Is there an implicit requirement that this be done quickly? in minimum space? in place? Aren't all of those YAGNIs (as David points out more briefly)? Except of course for you, since you are testing the migrations. ;-) In fact, I would be tempted to say "keep the old database around, in case you need to reverse migrate" (and for you, it would allow post-mortems and in general cross-checking of the correctness of migrations since you have both old and new databases).
Two problems I can foresee. First, name collisions: it's quite possible you'll add a new column with somewhat different semantics (and therefore different values) and want to give it the name attached to a previously deleted column. Easy to workaround if you're doing things by hand, but an enthusiasm-sapper for you (wearing your DB developer hat) because I suspect the only thing an auto-migrate script can do is fatal here.
Second, if the deleted column is involved in relationships, these relationships will go unmaintained in the new schema. I don't know if this would result in an actual bug, but it's unclean.

On Jul 06, 2012, at 10:00 PM, Barry Warsaw wrote:
This bug is about providing orderly schema migrations. I now admit that I'm blocked and I'm looking for suggestions.
I actually think I've finally cracked this nut. I have it working for SQLite, and am now testing for Postgres. When I get that working and committed to trunk, it would be great for others to try it on your test lists. You shouldn't have to do anything but stop, install the new version, and restart.
Looking forward to making progress again. -Barry

On Jul 21, 2012, at 06:17 PM, Barry Warsaw wrote:
This has just landed in lp:mailman and I have it working locally for both SQLite and PostgreSQL.
If you have beta1 deployed, please do give the upgrade a try. Make backups first though! :) If you encounter bugs, please file them in Launchpad, hopefully with a way to reproduce the problem, or at least let me know whether you're using SQLite or PostgreSQL.
Cheers, -Barry
participants (4)
-
Barry Warsaw
-
Barry Warsaw
-
David H. Brown
-
Stephen J. Turnbull