[DB-SIG] Python/PostgreSQL API performance comparison
Kevin Jacobs
jacobs at penguin.theopalgroup.com
Tue Jun 3 15:21:11 EDT 2003
[Updated to include results from pgdb and pyPgSQL]
Due to the recent interest in PostgreSQL DB-API driver comparisons, I
thought I'd jump on the proverbial bandwagon. As some of you already know,
my company has developed very substantial financial applications using
Python, and frequently PostgreSQL.
We've been using the PsycoPg driver for the past two years, because we found
it to be the most solid and least flawed of all the available drivers at the
time. I've updated our driver suite to include a recent versions PoPy,
pgdb, and pyPgSQL, and then run one of our very substantial test suites.
The suite constitutes a mix the following tasks:
1) Many simple OLTP queries.
2) Many simple and complex OLAP queries, some returning as many as a
hundred thousand rows of data.
3) Data-cube construction and manipulation
4) Business report generation
The application server and database run on the same (otherwise quiescent)
test server, the working data set is all in-core (though quite large), and
only serial requests (i.e., only a single active worker thread) are issued.
Results running Python CVS on the same application server with only the
driver setting changed:
Driver / version Wall Time (average of 4 runs, though little variation was
---------------- --------- observed)
PsycoPg 1.0.13 10m41s
PoPy 2.0.8 11m33s [1]
pgdb 3.3 14m40s
pyPgSQL 2.3 >15m51s [2] Did not complete all tests!
Notes:
[1] Unfortunately, other than for this simple test suite, PoPy is basically
unusable for us. This is because it does not return proper PostgreSQL
type codes, only vague type strings ('NUMBER','DATETIME','MISSING'?!).
Thus, it does not provide enough information to distinguish, e.g.,
booleans from numbers, numeric from floating-point values, dates from
datetimes, etc. As previously reported, it does not translate large
integers correctly, and it mangles some date interval types. These
deficiencies may not affect simpler or less-demanding applications, but
to us they qualify as unacceptable information loss.
[2] pyPgSQL required several minor modifications to work properly. First,
the compile failed if LONG_LONG was not defined/detected. Second, it
returns a non-standard cursor.description with an extra element
indicating array status. While a potentially useful thing, it is very
very non-standard. Finally, the cursor descriptions use non-hashable
type codes, which caused problems for our OR-mapper. I modified the
type objects, adding an appropriate __hash__ method, and all was well.
Once it was running, several test vectors failed deterministically with
the following error:
/usr/local/python-cvs/lib/python2.3/site-packages/pyPgSQL/PgSQL.py:
OperationalError: RelationForgetRelation: relation xxxxxxx is still open
Regards,
-Kevin
--
--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19 E-mail: jacobs at theopalgroup.com
Fax: (216) 986-0714 WWW: http://www.theopalgroup.com
More information about the DB-SIG
mailing list