[DB-SIG] Python/PostgreSQL API performance comparison

Billy G. Allie bill.allie at mug.org
Tue Jun 3 23:42:23 EDT 2003


Kevin Jacobs wrote:

>[Updated to include results from pgdb and pyPgSQL]
>
>Due to the recent interest in PostgreSQL DB-API driver comparisons, I
>thought I'd jump on the proverbial bandwagon.  As some of you already know,
>my company has developed very substantial financial applications using
>Python, and frequently PostgreSQL.
>
>We've been using the PsycoPg driver for the past two years, because we found
>it to be the most solid and least flawed of all the available drivers at the
>time.  I've updated our driver suite to include a recent versions PoPy,
>pgdb, and pyPgSQL, and then run one of our very substantial test suites. 
>The suite constitutes a mix the following tasks:
>
>  1) Many simple OLTP queries.
>  2) Many simple and complex OLAP queries, some returning as many as a
>     hundred thousand rows of data.
>  3) Data-cube construction and manipulation
>  4) Business report generation
>
>The application server and database run on the same (otherwise quiescent)
>test server, the working data set is all in-core (though quite large), and
>only serial requests (i.e., only a single active worker thread) are issued.
>
>Results running Python CVS on the same application server with only the
>driver setting changed:
>
>Driver / version  Wall Time (average of 4 runs, though little variation was
>----------------  ---------  observed)
>PsycoPg  1.0.13     10m41s
>PoPy     2.0.8      11m33s   [1]
>pgdb     3.3        14m40s
>pyPgSQL  2.3       >15m51s   [2] Did not complete all tests!
>
>
>Notes:
>
>[1] Unfortunately, other than for this simple test suite, PoPy is basically
>    unusable for us.  This is because it does not return proper PostgreSQL
>    type codes, only vague type strings ('NUMBER','DATETIME','MISSING'?!).
>    Thus, it does not provide enough information to distinguish, e.g.,
>    booleans from numbers, numeric from floating-point values, dates from
>    datetimes, etc.  As previously reported, it does not translate large
>    integers correctly, and it mangles some date interval types.  These
>    deficiencies may not affect simpler or less-demanding applications, but
>    to us they qualify as unacceptable information loss.
>
>[2] pyPgSQL required several minor modifications to work properly.  First,
>    the compile failed if LONG_LONG was not defined/detected.  Second, it
>    returns a non-standard cursor.description with an extra element
>    indicating array status.  While a potentially useful thing, it is very
>    very non-standard.  Finally, the cursor descriptions use non-hashable
>    type codes, which caused problems for our OR-mapper.  I modified the
>    type objects, adding an appropriate __hash__ method, and all was well.
>
Kevin, what system did you compile pyPgSQL on?  Setup.py tries to detect 
if long long is supported on the system it is compiled on.  If it is 
detected, then the LONG_LONG definition from pyport.h (via Python.h) is 
used.  If long long support isn't detected, then it doesn't compile in 
pgint8 support. I'm curious as to your configuration.

In regards to the description, pyPgSQL adds a array indicator as an 
eigth item.  The symantics of the first seven items have not changed, 
they match the DB-API specification.  How is this /very/ non-standard 
and how would this break code using the descriptor (descriptor[x][y] 
where x is the field (column) index and x is 0 through 6 will return 
what the DB-API 2.0 specification mandates)?

You're right.  PgTypes is not hashable.  I never had a reason to have it 
hashable before.  It will be hashable in the next release (the hash of a 
PgTypes will return the hash of the underlying type OID).

>    Once it was running, several test vectors failed deterministically with
>    the following error: 
>
>      /usr/local/python-cvs/lib/python2.3/site-packages/pyPgSQL/PgSQL.py:
>      OperationalError: RelationForgetRelation: relation xxxxxxx is still open
>
>  
>
I've never run across that error before.   Were there perhaps multiple 
threads sharing a connection?  PostgreSQL's connections are not thread 
safe and pyPgSQL does not support threads sharing a connection 
(threadsafety == 1).

-- 
___________________________________________________________________________
____       | Billy G. Allie    | Domain....: Bill.Allie at mug.org
|  /|      | 7436 Hartwell     | MSN.......: B_G_Allie at email.msn.com
|-/-|----- | Dearborn, MI 48126|
|/  |LLIE  | (313) 582-1540    |

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/db-sig/attachments/20030603/7a0df2b3/attachment-0001.htm


More information about the DB-SIG mailing list