[DB-SIG] Python/PostgreSQL API performance comparison

Kevin Jacobs jacobs at penguin.theopalgroup.com
Tue Jun 3 15:21:11 EDT 2003


[Updated to include results from pgdb and pyPgSQL]

Due to the recent interest in PostgreSQL DB-API driver comparisons, I
thought I'd jump on the proverbial bandwagon.  As some of you already know,
my company has developed very substantial financial applications using
Python, and frequently PostgreSQL.

We've been using the PsycoPg driver for the past two years, because we found
it to be the most solid and least flawed of all the available drivers at the
time.  I've updated our driver suite to include a recent versions PoPy,
pgdb, and pyPgSQL, and then run one of our very substantial test suites. 
The suite constitutes a mix the following tasks:

  1) Many simple OLTP queries.
  2) Many simple and complex OLAP queries, some returning as many as a
     hundred thousand rows of data.
  3) Data-cube construction and manipulation
  4) Business report generation

The application server and database run on the same (otherwise quiescent)
test server, the working data set is all in-core (though quite large), and
only serial requests (i.e., only a single active worker thread) are issued.

Results running Python CVS on the same application server with only the
driver setting changed:

Driver / version  Wall Time (average of 4 runs, though little variation was
----------------  ---------  observed)
PsycoPg  1.0.13     10m41s
PoPy     2.0.8      11m33s   [1]
pgdb     3.3        14m40s
pyPgSQL  2.3       >15m51s   [2] Did not complete all tests!


Notes:

[1] Unfortunately, other than for this simple test suite, PoPy is basically
    unusable for us.  This is because it does not return proper PostgreSQL
    type codes, only vague type strings ('NUMBER','DATETIME','MISSING'?!).
    Thus, it does not provide enough information to distinguish, e.g.,
    booleans from numbers, numeric from floating-point values, dates from
    datetimes, etc.  As previously reported, it does not translate large
    integers correctly, and it mangles some date interval types.  These
    deficiencies may not affect simpler or less-demanding applications, but
    to us they qualify as unacceptable information loss.

[2] pyPgSQL required several minor modifications to work properly.  First,
    the compile failed if LONG_LONG was not defined/detected.  Second, it
    returns a non-standard cursor.description with an extra element
    indicating array status.  While a potentially useful thing, it is very
    very non-standard.  Finally, the cursor descriptions use non-hashable
    type codes, which caused problems for our OR-mapper.  I modified the
    type objects, adding an appropriate __hash__ method, and all was well.

    Once it was running, several test vectors failed deterministically with
    the following error: 

      /usr/local/python-cvs/lib/python2.3/site-packages/pyPgSQL/PgSQL.py:
      OperationalError: RelationForgetRelation: relation xxxxxxx is still open

Regards,
-Kevin


-- 
--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs at theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com




More information about the DB-SIG mailing list