Python, MS SQL, and batch inserts
philip at semanchuk.com
Tue Apr 21 21:11:12 CEST 2009
On Apr 21, 2009, at 2:30 PM, ericwoodworth at gmail.com wrote:
> I'm not 100% sure it's python and not SQL but I do suspect there's a
> better way to do this than just serial inserts. I could be wrong
> about that which is what i'm trying to explore.
I don't think the SQL standard provides a way to do bulk inserts and
as a result most DB vendors have extended the standard to address this
common need (e.g. the COPY command in Postgres).
If you're doing a mass insert to populate a blank table it also often
helps to postpone index creation until after the table is populated.
> I already do use the time stamps and this is what I see:
> at 9 secs in I've gotten my data, formatted it, and placed it on the
> at 9.047 secs in the string.join() is done and I have my command
You said you're inserting ~5500 rows, so are you calling .join() on a
list of 5500 items? If so, 9 seconds seems painfully slow unless
you're on old hardware.
> at 35 secs the program ends. So somehow my SQL is taking a lot longer
> when I format it as single string.
> How would I make the whole thing one transaction? Just insert BEGIN
> TRANSACTION at the start and COMMIT at the end? Is that enough to do
That's the basic idea, but your database adapter (i.e. the Python
wrapper that talks to SQLServer) might want you to do it another way.
For instance, the connection object has a .commit() method and using
that is probably a better ideal than calling cursor.execute("COMMIT").
Per the Python DB API, your connection should be opened in
transactional mode by default. ("Note that if the database supports an
auto-commit feature, this must be initially off.")
In other words, if you adapter is DB-API compliant then the only
reason you're not using a transaction is because you're explicitly
turning them off.
More information about the Python-list