postgreSQL successes

Eric Lee Green eric at estinc.com
Thu Jun 29 11:36:53 EDT 2000


Erno Kuusela wrote:
> >>>>> "Eric" == Eric Lee Green <eric at estinc.com> writes:
>     Eric> Insert speed is horrible -- inserting data takes
>     Eric> forever and a half, though PostGreSQL 7.0 now has a bulk
>     Eric> import facility that may make it faster (I haven't had a
>     Eric> chance to benchmark it yet).
> 
> hmm. there are some things that can make inserts slow:
> 
> 1) there is a switch telling postgresql whether it should flush to
> disk after each transaction. this calls the flush() system call. its
> implementation is very inefficient on linux.  if you are running on
> linux and can live with the reduced robustness, turn that feature off.

Unfortunately, when you do this, you lose the ability to guarantee
consistency. Without some redesign of the PostGreSQL storage manager (which is
underway, BTW), it is very easy to get an inconsistent table this way. For
some applications this does not matter (I don't care TOO much if my cookies
table gets corrupted!). For other applications (like payroll :-) it definitely
DOES matter.  
 
> 2) if you are using sql insert statements without surrounding
> begin/commit (ie in autocommit mode), a transaction for each insert
> makes things slow.

Correct. I have made inserting data as much as 10 times faster by batching
inserts within a transaction. A word of warning, though: there *IS* a limit to
transaction size. That bit me once when I wanted to add 165,000 records in one
fell swoop. Sorry!
 
> 3) using sql insert statements is not itself the fastest thing int the
> world. the pygresql connection objects have a inserttable() method for
> doing something similar (i think) to what COPY does with the "psql"
> interactive interactive terminal.

This is new for PostGreSQL 7.0, I believe. I have not yet benchmarked it. 

> try these, i bet you will have much better luck.

Yes, I am aware of all the tricks for 6.x and below. You can make insert
performance rise from totally unusably to merely horribly slow by using them.
You cannot, however, in any way approach the performance of most commercial
databases or of MySQL, all of which can be as much as 10 times faster than
PostGreSQL at inserts (depending on indexes, etc.) even WITH all the tricks of
the trade. BTW, adding a single index on one 'TEXT' field to the table that I
added 165,000 records to raised the insert time from under 5 minutes to 20
minutes.... 
 
>     Eric> And, finally, the documentation is cryptic, incomplete, and
>     Eric> scattered all over the place... I had to reference two
>     Eric> different manuals in order to figure out how to create (and
>     Eric> destroy) a PL/SQL stored procedure, for example.
> 
> the book on the postgresql web site covers all of thiss stuff
> pretty nicely.  <URL:http://www.postgresql.org/docs/awbook.html>.

Glad to see the book is coming along. Last time I looked, it had only two
chapters :-). 

> as CGI anyway), but the mod_python and zope docs are pretty nice
> (httpdapy docs are not totally nonexistant either).

Last time I looked at ZOPE, it was not clear how to integrate new Python
methods with it. A shame, it looks like a great product. Maybe I'll get a
chance later on to really get into it and figure all that stuff out, it looks
like it would solve some of the performance problems I've encountered using
PostGreSQL and Python with Apache. 

-- 
Eric Lee Green                         eric at estinc.com
Software Engineer                      Visit our Web page:
Enhanced Software Technologies, Inc.   http://www.estinc.com/
(602) 470-1115 voice                   (602) 470-1116 fax



More information about the Python-list mailing list