NoSQL Movement?

D'Arcy J.M. Cain darcy at druid.net
Sun Mar 14 09:55:13 EDT 2010


On Sat, 13 Mar 2010 23:42:31 -0800
Jonathan Gardner <jgardner at jonathangardner.net> wrote:
> On Fri, Mar 12, 2010 at 11:23 AM, Paul Rubin <no.email at nospam.invalid> wrote:
> > "D'Arcy J.M. Cain" <darcy at druid.net> writes:
> >> Just curious, what database were you using that wouldn't keep up with
> >> you?  I use PostgreSQL and would never consider going back to flat
> >> files.
> >
> > Try making a file with a billion or so names and addresses, then
> > compare the speed of inserting that many rows into a postgres table
> > against the speed of copying the file.

That's a straw man argument.  Copying an already built database to
another copy of the database won't be significantly longer than copying
an already built file.  In fact, it's the same operation.

> Also consider how much work it is to partition data from flat files
> versus PostgreSQL tables.

Another straw man.  I'm sure you can come up with many contrived
examples to show one particular operation faster than another.
Benchmark writers (bad ones) do it all the time.  I'm saying that in
normal, real world situations where you are collecting billions of data
points and need to actually use the data that a properly designed
database running on a good database engine will generally be better than
using flat files.

> >> The only thing I can think of that might make flat files faster is
> >> that flat files are buffered whereas PG guarantees that your
> >> information is written to disk before returning
> >
> > Don't forget all the shadow page operations and the index operations,
> > and that a lot of these operations require reading as well as writing
> > remote parts of the disk, so buffering doesn't help avoid every disk
> > seek.

Not sure what a "shadow page operation" is but index operations are
only needed if you have to have fast access to read back the data.  If
it doesn't matter how long it takes to read the data back then don't
index it.  I have a hard time believing that anyone would want to save
billions of data points and not care how fast they can read selected
parts back or organize the data though.

> Plus the fact that your other DB operations slow down under the load.

Not with the database engines that I use.  Sure, speed and load are
connected whether you use databases or flat files but a proper database
will scale up quite well.

-- 
D'Arcy J.M. Cain <darcy at druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.



More information about the Python-list mailing list