[DB-SIG] Fetching result sets

Matthew T. Kromer matt@zope.com
Tue, 18 Sep 2001 12:12:43 -0400


Keating, Tim wrote:

>>>On a side note, I've always been a bit dissapointed by how 
>>>
>>long it takes
>>
>>>to actually wrap data from the RDBMS to python.  For 
>>>
>>example, my C layer
>>
>>>API can retrieve 39100 results from a Zip code table in .7 seconds
>>>(actually fetching the data into an opaque result set); yet the
>>>operation of turning these into python values adds an additional 2.4
>>>seconds -- by the time it goes through all of the python 
>>>
>>layered above,
>>
>>>it blows out to about 9 seconds for a fetchall(). Granted, 
>>>
>>part of that
>>
>>>is because that code is not highly optimized, but it still irks me.
>>>
>>Well, this is most certainly due to the fact that Python stores
>>the values in objects rather than a C array. (You could, however,
>>try to use the array module for faster access to these values.)
>>
>
>Actually, the way to do large homogeneous ops in oracle is to use bulk
>binding w/ PL/SQL tables. Unfortunately, I don't believe DCOracle2 supports
>this currently. (For that matter, does the Python DB API?)
>
>I recently had to do this, writing an application-specific module on top of
>OTL. Inserting 1000 rows manually (i.e. doing 1000 INSERT statements) took
>about 4 seconds, bulk binding was .4 seconds. However, while 10K rows took
>40 seconds with individual inserts (as expected), the cost of bulk binding
>merely doubled (.8 seconds).
>
Actually, I do do bulk binds, as you put it...  the down side is, I 
normally use dynamic vs static binds, and Oracle tends to puke on 
dynamic fetches -- it will randomly SKIP the callback on linux.  So I 
also have a setStatic() tweak (which is currently used) to change them 
into static binds when they are read-only.  When I was last actively 
looking at it, THAT puked on Solaris (usually one or the other works, 
though).

The executemany() op when given an array will set it up as a large array 
bind; although I think the limit imposed in the python layer is like 200 
rows at a time.

My test databases usually aren't set up for fast-write speeds though, so 
it's hard for me to give accurate "it's THIS much faster" numbers.  My 
test program is a zipcode table creator, which puts in 39100 zip codes 
with zip, city, state, country, latitude, longitude and the two test 
programs yeilded the following results:

record-at-a-time:     148 seconds or 0.004 sec/record
1000 record batches:    38 seconds or 0.001 sec/record

The loader does a  commit every 1000 records, which slows things down. 
 I would expect a higher than 4x speedup, which is why I expect either 
something is a bit wrong or it just takes a LOT of overhead to move data 
from python objects to buffers.  I'm sure my python-side code could be 
accelerated.