Python, MS SQL, and batch inserts
Philip Semanchuk
philip at semanchuk.com
Tue Apr 21 14:15:42 EDT 2009
On Apr 21, 2009, at 2:02 PM, ericwoodworth at gmail.com wrote:
> Hi,
> I have a python script I'm writing that grabs some data from a
> com object, does a little formatting, and then inserts that data into
> a MS SQL 2003 DB. Because I'm using COM objects I'm importing
> win32com.client. That also allows me to use ADODB.connection and
> ADODB.command objects for working with SQL.
>
> The program works fine but it's a little slow. Inserting ~5500
> rows of data takes about 10 seconds using a DB that is on the same
> machine running the script.
>
> I've done some general searches on how to speed this up and in
> other languages people suggest sending batches of inserts off at a
> time instead of executing 1 insert at a time. For java and .net
> people recommend using a stringbuilder function to create strings
> quickly. I don't know of such a function in python s I tried grouping
> my inserts into a single string using string += syntax. I knew that
> would be terrible but I wanted to see how terrible. Final reults: It
> was pretty terrible. Script went from taking ~18sec to taking
> 240sec. The overhead for recreating the strings was monster. No real
> surprise there.
>
> So I then loaded up the commands into a list and at the end I
> used the strong join method to create the string. This was far faster
> than using += to create my strings but still took twice as long as
> just running my inserts one at a time. So I'm looking for
> suggestions.
>
> Basically I have 5000 SQL inserts that I want to do as quickly as
> possible. This is purely academic as I can live with the 18 seconds
> the script needs to run (9 to talk to the com object and format the
> data and 10 to write to SQL) but I'm still curious how to improve on
> what I have running.
Are you sure your logjam is in Python? Inserting 5500 rows can take a
few seconds if you're COMMITting after each INSERT. Wrap the whole
thing in an explicit transaction and see if that helps.
Also, toss in a few print statements containing timestamps so you know
more about where the script is spending time.
bye
Philip
More information about the Python-list
mailing list