[Python-Dev] bsddb

Jesus Cea jcea at jcea.es
Thu Sep 4 19:40:28 CEST 2008

Hash: SHA1

Oleg Broytmann wrote:
> -- SQLite is public domain; the licensing terms of Berkeley DB[1] are not
>    friendly to commercial applications: "Our open source license ...
>    permits use of Berkeley DB in open source projects or in applications
>    that are not distributed to third parties." I am not sure if using of
>    PyBSDDB in commercial applications is considered "using of Berkeley DB
>    in open source projects";

I can't comment on this. I'm not a lawyer.

> -- SQLite has a pretty stable API and a pretty stable on-disk format; for
>    bsddb one needs to do dump/reload on every major release;

Not at all. The worst thing you would need to do is a "db_upgrade", an
in-place operation. Lately it is pretty harmless and fast (like
upgrading the log format, not the database file format).

A stable fileformat is useful for long term support, but an evolving
format allows improvements. Following your reasoning, Python should be
keep in 1.0 era, for compatibility sake.

> -- SQLite implements a subset of SQL - a powerful query language;

Yes, a declarative language completely unrelated to Python.

> -- SQLite is extensible - one can write his/her own functions and
>    aggregates, e.g.; PySQLite allows to write these functions in Python;
>    PySQLite also allows to write data conversion functions that converts
>    between Python and SQL data types;

bsddb 4.7.4 (available next month) will allow to subclass DB/DBEnv, etc.
objects, so you can implement the logic you wish there. Until that, you
can do proxy/delegation (that is the way I'm doing 3.0 compatibility, BTW).

> -- a program can attach a few databases at once thus distributing loads
>    between a number of disks, including network mounts.

That is an OS issue. Any program get the benefice.

The problem is not disk capacity. Any modern machine can scale disk
without bound via iSCSI, for example (god bless ZFS!). The issue is
replication for redundancy, load sharing and high availability. These
things are available in bsddb 4.7.3 (that is, in Python 2.6).

How do you scale traffic demand in SQLite?. I can keep adding machines
to solve read requests, without sharing any disk between them. I can
launch 64 bsddb processes in a single 64 CPU machine to manage
(read/write) a single shared database. I don't know if SQLite can do
that. Berkeley DB can.

>    Durus (and ZODB) has an index of all objects, the index is stored in
> memory AFAIK - a real problem if one has millions of objects. Does bsddb
> help to mitigate the problem?

Latest Durus has not that issue, but you always can use another project
of mine:

Berkeley DB Backend Storage Engine for DURUS

This code (with some private enhancements for replication and
distributed transactions) manages a nearly 200 Terabytes Durus
repository without any sweat (~2^35 objects stored there).

In this particular instance, distributed transactions allows me to
partition data between several machines, with no sharing between them,
and replication allows redundancy and routing read requests to a less
loaded machine (the writes goes to the "master" machine, replication
from there is transparent).

SQLite is a good product. I just dislike "SQL fits all" model, as I
already said in another message. Using a SQL storage to save persistent
Python objects is ugly and SQL language is of no use there. You just
need "something" safe, scalable, configurable and being able to give you
opaque objects when an OID (Object ID) is presented. Compare a terabyte
Python "shelve" object, ease of use, transparence, etc., with keeping
the objects in a SQL database server.

Just my opinion, of course.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the Python-Dev mailing list