Python, MS SQL, and batch inserts

Tue Apr 21 15:11:12 EDT 2009

On Apr 21, 2009, at 2:30 PM, ericwoodworth at gmail.com wrote:

>
> I'm not 100% sure it's python and not SQL but I do suspect there's a
> better way to do this than just serial inserts.  I could be wrong
> about that which is what i'm trying to explore.

I don't think the SQL standard provides a way to do bulk inserts and  
as a result most DB vendors have extended the standard to address this  
common need (e.g. the COPY command in Postgres).

If you're doing a mass insert to populate a blank table it also often  
helps to postpone index creation until after the table is populated.

>
> I already do use the time stamps and this is what I see:
> at 9 secs in I've gotten my data, formatted it, and placed it on the
> list
> at 9.047 secs in the string.join() is done and I have my command
> string

You said you're inserting ~5500 rows, so are you calling .join() on a  
list of 5500 items? If so, 9 seconds seems painfully slow unless  
you're on old hardware.

> at 35 secs the program ends.  So somehow my SQL is taking a lot longer
> when I format it as single string.
>
> How would I make the whole thing one transaction?  Just insert BEGIN
> TRANSACTION at the start and COMMIT at the end?  Is that enough to do
> it?

That's the basic idea, but your database adapter (i.e. the Python  
wrapper that talks to SQLServer)  might want you to do it another way.  
For instance, the connection object has a .commit() method and using  
that is probably a better ideal than calling cursor.execute("COMMIT").

Per the Python DB API, your connection should be opened in  
transactional mode by default. ("Note that if the database supports an  
auto-commit feature, this must be initially off.")
http://www.python.org/dev/peps/pep-0249/

In other words, if you adapter is DB-API compliant then the only  
reason you're not using a transaction is because you're explicitly  
turning them off.

bye
P