NoSQL Movement?

Duncan Booth duncan.booth at invalid.invalid
Thu Mar 4 08:10:34 EST 2010


Avid Fan <me at privacy.net> wrote:

> Jonathan Gardner wrote:
> 
>> 
>> I see it as a sign of maturity with sufficiently scaled software that
>> they no longer use an SQL database to manage their data. At some 
point
>> in the project's lifetime, the data is understood well enough that 
the
>> general nature of the SQL database is unnecessary.
>> 
> 
> I am really struggling to understand this concept.
> 
> Is it the normalised table structure that is in question or the query 
> language?
> 
> Could you give some sort of example of where SQL would not be the way 
to 
> go.   The only things I can think of a simple flat file databases.

Probably one of the best known large non-sql databases is Google's 
bigtable. Xah Lee of course dismissed this as he decided to write how 
bad non-sql databases are without actually looking at the prime example.

If you look at some of the uses of bigtable you may begin to understand 
the tradeoffs that are made with sql. When you use bigtable you have 
records with fields, and you have indices, but there are limitations on 
the kinds of queries you can perform: in particular you cannot do joins, 
but more subtly there is no guarantee that the index is up to date (so 
you might miss recent updates or even get data back from a query when 
the data no longer matches the query).

By sacrificing some of SQL's power, Google get big benefits: namely 
updating data is a much more localised option. Instead of an update 
having to lock the indices while they are updated, updates to different 
records can happen simultaneously possibly on servers on the opposite 
sides of the world. You can have many, many servers all using the same 
data although they may not have identical or completely consistent views 
of that data.

Bigtable impacts on how you store the data: for example you need to 
avoid reducing data to normal form (no joins!), its much better and 
cheaper just to store all the data you need directly in each record. 
Also aggregate values need to be at least partly pre-computed and stored 
in the database.

Boiling this down to a concrete example, imagine you wanted to implement 
a system like twitter. Think carefully about how you'd handle a 
sufficiently high rate of new tweets reliably with a sql database. Now 
think how you'd do the same thing with bigtable: most tweets don't 
interact, so it becomes much easier to see how the load is spread across 
the servers: each user has the data relevant to them stored near the 
server they are using and index changes propagate gradually to the rest 
of the system.

-- 
Duncan Booth http://kupuguy.blogspot.com



More information about the Python-list mailing list