Which non SQL Database ?
Roy Smith
roy at panix.com
Sun Jan 23 11:12:35 EST 2011
In article <pan.2011.01.23.06.09.16 at pfln.invalid>,
Deadly Dirk <dirk at pfln.invalid> wrote:
> The same thing applies to MongoDB which is equally fast but does allow ad
> hoc queries and has quite a few options how to do them. It allows you to
> do the same kind of querying as RDBMS software, with the exception of
> joins. No joins.
Well, sort of. You can use forEach() to get some join-like
functionality. You don't get the full join optimization that SQL gives
you, but at least you get to do some processing on the server side so
you don't have to ship 40 gazillion records over the network to pick the
three you wanted.
> It also allows map/reduce queries using JavaScript and
> is not completely schema free.
What do you mean by "not completely schema free"?
> Databases have sub-objects called "collections" which can be indexed
> or partitioned across several machines ("sharding"), which is an
> excellent thing for building shared-nothing clusters.
We've been running Mongo 1.6.x for a few months. Based on our
experiences, I'd say sharding is definitely not ready for prime time.
There's two issues; stability and architecture.
First, stability. We see mongos (the sharding proxy) crash a couple of
times a week. We finally got the site stabilized by rigging upstart to
monitor and automatically restart mongos when it crashes. Fortunately,
mongos crashing doesn't cause any data loss (at least not that we've
noticed). Hopefully this is something the 10gen folks will sort out in
the 1.8 release.
The architectural issues are more complex. Mongo can enforce uniqueness
on a field, but only on non-sharded collection. Security (i.e. password
authentication) does not work in a sharded environment. If I understand
the release notes correctly, that's something which may get fixed in
some future release.
> Scripting languages like Python are
> very well supported and linked against MongoDB
The Python interface is very nice. In some ways, the JS interface is
nicer, only because you can get away with less quoting, i.e.
JS: find({inquisition: {$ne: 'spanish'}}
Py: find({'inquisition': {'$ne': 'spanish'}}
The PHP interface is (like everything in PHP), sucky:
PHP: find(array('inquisition' => array('$ne' => 'spanish'))
The common thread here is that unlike SQL, you're not feeding the
database a string which it parses, you're feeding it a data structure.
You're stuck with whatever data structure syntax the host language
supports. Well, actually, that's not true. If you wanted to, you could
write a front end which lets you execute:
"find where inquisition != spanish"
and have code to parse that and turn it into the required data
structure. The odds of anybody doing that are pretty low, however. It
would just feel wrong. In much the same way that SQLAlchemy's
functional approach to building a SQL query just feels wrong to somebody
who knows SQL.
> I find MongoDB well suited for what is
> traditionally known as data warehousing.
I'll go along with that. It's a way to build a fast (possibly
distributed, if they get sharding to work right) network datastore with
some basic query capability. Compared to SQL, you end up doing a lot
more work on the application side, and take on a lot more of the
responsibility to enforce data integrity yourself.
> You may want to look
> at this Youtube clip entitled "MongoDB is web scale":
>
> http://www.youtube.com/watch?v=b2F-DItXtZs
That's the funniest thing I've seen in a long time. The only sad part
is that it's all true.
There are some nice things to NO-SQL databases (particularly the
schema-free part). A while ago, we discovered that about 200 of the
300,000 documents in one of our collections were effectively duplicates
of other documents ("document" in mongo-speak means "record" or perhaps
"row" in SQL-speak). It was trivial to add "is_dup_of" fields to just
those 200 records, and a little bit of code in our application to check
the retrieved documents for that field and retrieve the pointed-to
document. In SQL, that would have meant adding another column, or
perhaps another table. Either way would have been far more painful than
the fix we were able to do in mongo.
More information about the Python-list
mailing list