PEP 249 (database api) -- executemany() with iterable?

Jon Clements joncle at googlemail.com
Tue Oct 12 14:35:23 EDT 2010


On 12 Oct, 18:53, Jon Clements <jon... at googlemail.com> wrote:
> On 12 Oct, 18:32, Roy Smith <r... at panix.com> wrote:
>
>
>
> > On Oct 12, 1:20 pm, Jon Clements <jon... at googlemail.com> wrote:
>
> > > On 12 Oct, 16:10, Roy Smith <r... at panix.com> wrote:
>
> > > > PEP 249 says about executemany():
>
> > > >         Prepare a database operation (query or command) and then
> > > >         execute it against all parameter sequences or mappings
> > > >         found in the sequence seq_of_parameters.
>
> > > > are there any plans to update the api to allow an iterable instead of
> > > > a sequence?
>
> > > I'm not understanding (probably me). Example?
>
> > I have a dictionary, d, which has a million items in it, and want to
> > do something like:
>
> >     executemany("update foo set bar = %s where id = %s",
> > d.iteritems())
>
> > If executemany accepted an iterable, that would work.  But, it only
> > accepts a sequence, so I need to do:
>
> >     executemany("update foo set bar = %s where id = %s", d.items())
>
> > which generates a million-item temporary list.  Or am I mis-
> > understanding the PEP?
>
> Interesting, but here's my guess...
>
> Replace d.items() with itertools.repeat( ('a', 'b') )
>
> So, if you have a sequence, which has a length and known size, at
> least you can have an attempt at the DB operations: whether the
> transaction fails or not is another thing...In short, a sequence is
> finite, while an iterable may be infinite.
>
> That's just my guess and makes sense to me at least!
>
> Jon.

Actually, thinking about it some more, I would take the following
approach:
(this is only loosely do with the Python DB API mind you...)

1) Start a transaction
2) Create a temporary table
3) Bulk insert your million update records to the temp table (from my
understanding of the PEP, executemany(), is allowed to repeatedly call
execute() unless it can do something "cleverer")
4) Execute an update with a from statement joining your main table and
temp table (pretty sure that's ANSI standard, and DB's should support
it -- embedded one's may not though, but if you're dealing with 1mil
records, I'm taking a guess you're not dealing with embedded)
5) End the transaction

Far more efficient as a repeated execute of 'update' will only just
hit the DB once, while an update statement with a from should allow
the DB a chance to optimise it.

Hope that makes sense, lemme know.

Cheers,

Jon.







More information about the Python-list mailing list