[DB-SIG] Fetching result sets
Matthew T. Kromer
matt@zope.com
Tue, 18 Sep 2001 12:12:43 -0400
Keating, Tim wrote:
>>>On a side note, I've always been a bit dissapointed by how
>>>
>>long it takes
>>
>>>to actually wrap data from the RDBMS to python. For
>>>
>>example, my C layer
>>
>>>API can retrieve 39100 results from a Zip code table in .7 seconds
>>>(actually fetching the data into an opaque result set); yet the
>>>operation of turning these into python values adds an additional 2.4
>>>seconds -- by the time it goes through all of the python
>>>
>>layered above,
>>
>>>it blows out to about 9 seconds for a fetchall(). Granted,
>>>
>>part of that
>>
>>>is because that code is not highly optimized, but it still irks me.
>>>
>>Well, this is most certainly due to the fact that Python stores
>>the values in objects rather than a C array. (You could, however,
>>try to use the array module for faster access to these values.)
>>
>
>Actually, the way to do large homogeneous ops in oracle is to use bulk
>binding w/ PL/SQL tables. Unfortunately, I don't believe DCOracle2 supports
>this currently. (For that matter, does the Python DB API?)
>
>I recently had to do this, writing an application-specific module on top of
>OTL. Inserting 1000 rows manually (i.e. doing 1000 INSERT statements) took
>about 4 seconds, bulk binding was .4 seconds. However, while 10K rows took
>40 seconds with individual inserts (as expected), the cost of bulk binding
>merely doubled (.8 seconds).
>
Actually, I do do bulk binds, as you put it... the down side is, I
normally use dynamic vs static binds, and Oracle tends to puke on
dynamic fetches -- it will randomly SKIP the callback on linux. So I
also have a setStatic() tweak (which is currently used) to change them
into static binds when they are read-only. When I was last actively
looking at it, THAT puked on Solaris (usually one or the other works,
though).
The executemany() op when given an array will set it up as a large array
bind; although I think the limit imposed in the python layer is like 200
rows at a time.
My test databases usually aren't set up for fast-write speeds though, so
it's hard for me to give accurate "it's THIS much faster" numbers. My
test program is a zipcode table creator, which puts in 39100 zip codes
with zip, city, state, country, latitude, longitude and the two test
programs yeilded the following results:
record-at-a-time: 148 seconds or 0.004 sec/record
1000 record batches: 38 seconds or 0.001 sec/record
The loader does a commit every 1000 records, which slows things down.
I would expect a higher than 4x speedup, which is why I expect either
something is a bit wrong or it just takes a LOT of overhead to move data
from python objects to buffers. I'm sure my python-side code could be
accelerated.