Python, MS SQL, and batch inserts
ericwoodworth at gmail.com
ericwoodworth at gmail.com
Tue Apr 21 14:30:20 EDT 2009
On Apr 21, 2:15 pm, Philip Semanchuk <phi... at semanchuk.com> wrote:
> On Apr 21, 2009, at 2:02 PM, ericwoodwo... at gmail.com wrote:
>
>
>
> > Hi,
> > I have a python script I'm writing that grabs some data from a
> > com object, does a little formatting, and then inserts that data into
> > a MS SQL 2003 DB. Because I'm using COM objects I'm importing
> > win32com.client. That also allows me to use ADODB.connection and
> > ADODB.command objects for working with SQL.
>
> > The program works fine but it's a little slow. Inserting ~5500
> > rows of data takes about 10 seconds using a DB that is on the same
> > machine running the script.
>
> > I've done some general searches on how to speed this up and in
> > other languages people suggest sending batches of inserts off at a
> > time instead of executing 1 insert at a time. For java and .net
> > people recommend using a stringbuilder function to create strings
> > quickly. I don't know of such a function in python s I tried grouping
> > my inserts into a single string using string += syntax. I knew that
> > would be terrible but I wanted to see how terrible. Final reults: It
> > was pretty terrible. Script went from taking ~18sec to taking
> > 240sec. The overhead for recreating the strings was monster. No real
> > surprise there.
>
> > So I then loaded up the commands into a list and at the end I
> > used the strong join method to create the string. This was far faster
> > than using += to create my strings but still took twice as long as
> > just running my inserts one at a time. So I'm looking for
> > suggestions.
>
> > Basically I have 5000 SQL inserts that I want to do as quickly as
> > possible. This is purely academic as I can live with the 18 seconds
> > the script needs to run (9 to talk to the com object and format the
> > data and 10 to write to SQL) but I'm still curious how to improve on
> > what I have running.
>
> Are you sure your logjam is in Python? Inserting 5500 rows can take a
> few seconds if you're COMMITting after each INSERT. Wrap the whole
> thing in an explicit transaction and see if that helps.
>
> Also, toss in a few print statements containing timestamps so you know
> more about where the script is spending time.
>
> bye
> Philip
I'm not 100% sure it's python and not SQL but I do suspect there's a
better way to do this than just serial inserts. I could be wrong
about that which is what i'm trying to explore.
I already do use the time stamps and this is what I see:
at 9 secs in I've gotten my data, formatted it, and placed it on the
list
at 9.047 secs in the string.join() is done and I have my command
string
at 35 secs the program ends. So somehow my SQL is taking a lot longer
when I format it as single string.
How would I make the whole thing one transaction? Just insert BEGIN
TRANSACTION at the start and COMMIT at the end? Is that enough to do
it?
More information about the Python-list
mailing list