NoSQL Movement?

Paul Rubin no.email at nospam.invalid
Fri Mar 12 14:23:10 EST 2010


"D'Arcy J.M. Cain" <darcy at druid.net> writes:
> Just curious, what database were you using that wouldn't keep up with
> you?  I use PostgreSQL and would never consider going back to flat
> files.  

Try making a file with a billion or so names and addresses, then
compare the speed of inserting that many rows into a postgres table
against the speed of copying the file.

> The only thing I can think of that might make flat files faster is
> that flat files are buffered whereas PG guarantees that your
> information is written to disk before returning 

Don't forget all the shadow page operations and the index operations,
and that a lot of these operations require reading as well as writing
remote parts of the disk, so buffering doesn't help avoid every disk
seek.

Generally when faced with this sort of problem I find it worthwhile to
ask myself whether the mainframe programmers of the 1960's-70's had to
deal with the same thing, e.g. when sending out millions of phone bills,
or processing credit card transactions (TPF), then ask myself how they
did it.  Their computers had very little memory or disk space by today's
standards, so their main bulk storage medium was mag tape.  A heck of a
lot of these data processing problems can be recast in terms of sorting
large files on tape, rather than updating database one record at a time
on disk or in memory.  And that is still what (e.g.) large search
clusters spend a lot of their time doing (look up the term "pennysort"
for more info).



More information about the Python-list mailing list