Dear all,
It has been some time since the last info regarding HyperKitty, but the project has made some progress.
I have implemented to database interface that I was presented already a month ago [1].
So now there is a standalone project/library, KittyStore [2] which provides an interface to the database. It defines an interface which can then be implemented for whatever database system you would like. At the moment it covers MongoDB and PostgreSQL.
Once I implemented this interface I tried to solve the question of which database system should we primarily focus on.
I wrote a small comparison test of the two systems on an RHEL6 system [3] (I still have to publish the results for F17). The difference between the two databases system is not so large (1s for a query that already takes 6 seconds) and there was not any tuning of the servers. So I think the advantage of having only one back-end for mailman and its archive is worth this time difference in the results (which might anyway get even better).
Thus we will move forward with PostgreSQL as a back-end for HyperKitty. The good news is that HyperKitty already works fine with PostgreSQL and KittyStore (if you use the correct branch [4]). However, we have to rebuild our test server, so we cannot show you how the latest version works right now.
So all is nicely getting in place for Aamir to start working on his project and for HK to get further down the road :)
I think that's about all I wanted to say, fire away if you have questions!
Best regards, Pierre
[1] http://mail.python.org/pipermail/mailman-developers/2012-April/022012.html [2] https://github.com/pypingou/kittystore [3] http://blog.pingoured.fr/index.php?post/2012/05/20/PostgreSQL-vs-MongoDB [4] http://bzr.fedorahosted.org/bzr/hyperkitty/rdbms/files
On May 25, 2012, at 11:18 AM, Pierre-Yves Chibon wrote:
Thus we will move forward with PostgreSQL as a back-end for HyperKitty. The good news is that HyperKitty already works fine with PostgreSQL and KittyStore (if you use the correct branch [4]). However, we have to rebuild our test server, so we cannot show you how the latest version works right now.
I haven't looked, but is it possible to use an intermediate layer like an ORM which would allow a user to use different databases under the hood? For example, the core is written using Storm, so is pretty easy to use any database that Storm supports. Currently, because of minor SQL differences, this means SQLite and PostgreSQL, but it should be easy for someone to contribute MySQL code. An nosql database might be harder though because that doesn't fit naturally into Storm's view of the world.
-Barry
On Fri, May 25, 2012 at 11:03:10AM -0400, Barry Warsaw wrote:
On May 25, 2012, at 11:18 AM, Pierre-Yves Chibon wrote:
Thus we will move forward with PostgreSQL as a back-end for HyperKitty. The good news is that HyperKitty already works fine with PostgreSQL and KittyStore (if you use the correct branch [4]). However, we have to rebuild our test server, so we cannot show you how the latest version works right now.
I haven't looked, but is it possible to use an intermediate layer like an ORM which would allow a user to use different databases under the hood? For example, the core is written using Storm, so is pretty easy to use any database that Storm supports. Currently, because of minor SQL differences, this means SQLite and PostgreSQL, but it should be easy for someone to contribute MySQL code. An nosql database might be harder though because that doesn't fit naturally into Storm's view of the world.
We're using SQLAlchemy i(a different ORM) to abstract the database layer. That means the big three open source solutions (sqlite, postgres, and mysql) and proprietary ones like oracle should all be supported. However, we're thinking about using fulltext search indexes which would mean tying it to specific backends. It may be possible to run in a degraded mode (searches being slower... but for small to medium lists this is likely fine) so we could keep the backend abstraction while running faster on certain databases..
MongoDB is not a relational database so using an ORM seems like a bit of a hack. People have adapted SQLAlchemy for use with MongoDB but I'm not sure what sacrifices they had to make to make that work.
Running comparisons using postgres was actually a choice dictated by our experience with the Core. In our use, using the sqlite backend quickly resulted in database locks that we couldn't figure our way out of. So that effectively left us with postgres as the only db for the core. If admins have to run postgres for the Core then the thinking is that it wouldn't be hard for them to also run it for the archiver. I've never had the sqlite issue with SQLAlchemy,though, so we could probably support a non-fulltext-optimized search in sqlite without doing much of anything.
Note that pingou's current blog post is done without fulltext indexes on a moderate sized list and we made the decision to focus on postgres from those benchmark numbers. Postgres is slower but not enough that we think users will be upset at the speed. A mongodb backend would be nice for those who have even larger lists (or possibly for a large number of lists on the same server -- since the contents of tables are cached in memory if there's enough available, a system that hosted a large number of moderate sized lists might also benefit from a faster backend.
-Toshio
On May 25, 2012, at 08:59 AM, Toshio Kuratomi wrote:
We're using SQLAlchemy i(a different ORM) to abstract the database layer.
At one point the core used SA, but this was many years ago and I had lots of problems with it. IIRC it was mostly, when things went south, it was quite difficult to debug. I'm sure things have improved since then (and SA is Python 3 compatible, whereas Storm is not, so... :).
That means the big three open source solutions (sqlite, postgres, and mysql) and proprietary ones like oracle should all be supported. However, we're thinking about using fulltext search indexes which would mean tying it to specific backends. It may be possible to run in a degraded mode (searches being slower... but for small to medium lists this is likely fine) so we could keep the backend abstraction while running faster on certain databases..
That's totally reasonable, and mirrors our policy on JavaScript.
MongoDB is not a relational database so using an ORM seems like a bit of a hack. People have adapted SQLAlchemy for use with MongoDB but I'm not sure what sacrifices they had to make to make that work.
Agreed.
Running comparisons using postgres was actually a choice dictated by our experience with the Core. In our use, using the sqlite backend quickly resulted in database locks that we couldn't figure our way out of. So that effectively left us with postgres as the only db for the core. If admins have to run postgres for the Core then the thinking is that it wouldn't be hard for them to also run it for the archiver. I've never had the sqlite issue with SQLAlchemy,though, so we could probably support a non-fulltext-optimized search in sqlite without doing much of anything.
This is a serious question, and I know that multiproc sqlite applications are prone to database lockups. They're also notoriously difficult to debug, or even figure out enough to file a reasonable bug. I think in general it's because there some path through the code of one of the processes that doesn't end in a commit or abort, and I know I've fixed some of those.
It would be sad if a smallish Mailman system couldn't be effectively run on SQLite, just because it's so much easier to set up. Certainly it's much better for day-to-day development. I guess we'll find out as more real deployments happen. In any case, I'd still like to officially support PostgreSQL, and for that matter, MySQL (and derivatives), but I admit to not testing very often on the former, and no one has contributed support for the latter yet. We really need some buildbots. :)
-Barry
participants (3)
-
Barry Warsaw
-
Pierre-Yves Chibon
-
Toshio Kuratomi