xahlee at gmail.com
Wed Mar 3 18:36:26 CET 2010
recently i wrote a blog article on The NoSQL Movement
i'd like to post it somewhere public to solicit opinions, but in the
20 min or so, i couldn't find a proper newsgroup, nor private list
that my somewhat anti-NoSQL Movement article is fitting.
So, i thought i'd post here to solicit some opinins from the programer
community i know.
Here's the plain text version
The NoSQL Movement
Xah Lee, 2010-01-26
In the past few years, there's new fashionable thinking about anti
relational database, now blessed with a rhyming term: NoSQL.
Basically, it considers that relational database is outdated, and not
“horizontally” scalable. I'm quite dubious of these claims.
According to Wikipedia Scalability article, verticle scalability means
adding more resource to a single node, such as more cpu, memory. (You
can easily do this by running your db server on a more powerful
machine.), and “Horizontal scalability” means adding more machines.
(and indeed, this is not simple with sql databases, but again, it is
the same situation with any software, not just database. To add more
machines to run one single software, the software must have some sort
of grid computing infrastructure built-in. This is not a problem of
the software per se, it is just the way things are. It is not a
problem of databases.)
I'm quite old fashioned when it comes to computer technology. In order
to convience me of some revolutionary new-fangled technology, i must
see improvement based on math foundation. I am a expert of SQL, and
believe that relational database is pretty much the gist of database
with respect to math. Sure, a tight definition of relations of your
data may not be necessary for many applications that simply just need
store and retrieve and modify data without much concern about the
relations of them. But still, that's what relational database
technology do too. You just don't worry about normalizing when you
design your table schema.
The NoSQL movement is really about scaling movement, about adding more
machines, about some so-called “cloud computing” and services with
simple interfaces. (like so many fashionable movements in the
computing industry, often they are not well defined.) It is not really
about anti relation designs in your data. It's more about adding
features for practical need such as providing easy-to-user APIs (so
you users don't have to know SQL or Schemas), ability to add more
nodes, provide commercial interface services to your database, provide
parallel systems that access your data. Of course, these needs are all
done by any big old relational database companies such as Oracle over
the years as they constantly adopt the changing industry's needs and
cheaper computing power. If you need any relations in your data, you
can't escape relational database model. That is just the cold truth of
Importat data, such as used in the bank transactions, has relations.
You have to have tight relational definitions and assurance of data
Here's a second hand quote from Microsoft's Technical Fellow David
I've been doing this database stuff for over 20 years and I
remember hearing that the object databases were going to wipe out
the SQL databases. And then a little less than 10 years ago the
XML databases were going to wipe out.... We actually ... you
know... people inside Microsoft, [have said] 'let's stop working
on SQL Server, let's go build a native XML store because in five
years it's all going....'
LOL. That's exactly my thought.
Though, i'd have to have some hands on experience with one of those
new database services to see what it's all about.
Amazon S3 and Dynamo
Look at Structured storage. That seems to be what these nosql
databases are. Most are just a key-value pair structure, or just
storage of documents with no relations. I don't see how this differ
from a sql database using one single table as schema.
Amazon's Amazon S3 is another storage service, which uses Amazon's
Dynamo (storage system), indicated by Wikipedia to be one of those
NoSQL db. Looking at the S3 and Dynamo articles, it appears the db is
just a Distributed hash table system, with added http access
interface. So, basically, little or no relations. Again, i don't see
how this is different from, say, MySQL with one single table of 2
columns, added with distributed infrastructure. (distributed database
is often a integrated feature of commercial dbs, e.g. Wikipedia Oracle
database article cites Oracle Real Application Clusters )
Here's a interesting quote on S3:
Bucket names and keys are chosen so that objects are addressable
using HTTP URLs:
* http://bucket/key (where bucket is a DNS CNAME record
pointing to bucket.s3.amazonaws.com)
Because objects are accessible by unmodified HTTP clients, S3 can
be used to replace significant existing (static) web hosting
So this means, for example, i can store all my images in S3, and in my
html document, the inline images are just normal img tags with normal
urls. This applies to any other type of file, pdf, audio, but html
too. So, S3 becomes the web host server as well as the file system.
Here's Amazon's instruction on how to use it as image server. Seems
quite simple: How to use Amazon S3 for hosting web pages and media
Another is Google's BigTable. I can't make much comment. To make a
sensible comment, one must have some experience of actually
implementing a database. For example, a file system is a sort of
database. If i created a scheme that allows me to access my data as
files in NTFS that are distributed over hundreds of PC, communicated
thru http running Apache. This will let me access my files. To insert,
delete, data, one can have cgi scripts on each machine. Would this be
considered as a new fantastic NoNoSQL?
comments can also be posted to
More information about the Python-list