Python, MS SQL, and batch inserts
Philip Semanchuk
philip at semanchuk.com
Tue Apr 21 15:11:12 EDT 2009
On Apr 21, 2009, at 2:30 PM, ericwoodworth at gmail.com wrote:
>
> I'm not 100% sure it's python and not SQL but I do suspect there's a
> better way to do this than just serial inserts. I could be wrong
> about that which is what i'm trying to explore.
I don't think the SQL standard provides a way to do bulk inserts and
as a result most DB vendors have extended the standard to address this
common need (e.g. the COPY command in Postgres).
If you're doing a mass insert to populate a blank table it also often
helps to postpone index creation until after the table is populated.
>
> I already do use the time stamps and this is what I see:
> at 9 secs in I've gotten my data, formatted it, and placed it on the
> list
> at 9.047 secs in the string.join() is done and I have my command
> string
You said you're inserting ~5500 rows, so are you calling .join() on a
list of 5500 items? If so, 9 seconds seems painfully slow unless
you're on old hardware.
> at 35 secs the program ends. So somehow my SQL is taking a lot longer
> when I format it as single string.
>
> How would I make the whole thing one transaction? Just insert BEGIN
> TRANSACTION at the start and COMMIT at the end? Is that enough to do
> it?
That's the basic idea, but your database adapter (i.e. the Python
wrapper that talks to SQLServer) might want you to do it another way.
For instance, the connection object has a .commit() method and using
that is probably a better ideal than calling cursor.execute("COMMIT").
Per the Python DB API, your connection should be opened in
transactional mode by default. ("Note that if the database supports an
auto-commit feature, this must be initially off.")
http://www.python.org/dev/peps/pep-0249/
In other words, if you adapter is DB-API compliant then the only
reason you're not using a transaction is because you're explicitly
turning them off.
bye
P
More information about the Python-list
mailing list