PEP 249 (database api) -- executemany() with iterable?

John Nagle nagle at animats.com
Tue Oct 12 15:43:12 EDT 2010


On 10/12/2010 11:35 AM, Jon Clements wrote:
> On 12 Oct, 18:53, Jon Clements<jon... at googlemail.com>  wrote:
>> On 12 Oct, 18:32, Roy Smith<r... at panix.com>  wrote:
>>
>>
>>
>>> On Oct 12, 1:20 pm, Jon Clements<jon... at googlemail.com>  wrote:
>>
>>>> On 12 Oct, 16:10, Roy Smith<r... at panix.com>  wrote:
>>
>>>>> PEP 249 says about executemany():
>>
>>>>>          Prepare a database operation (query or command) and then
>>>>>          execute it against all parameter sequences or mappings
>>>>>          found in the sequence seq_of_parameters.
>>
>>>>> are there any plans to update the api to allow an iterable instead of
>>>>> a sequence?
>>
>>>> I'm not understanding (probably me). Example?
>>
>>> I have a dictionary, d, which has a million items in it, and want to
>>> do something like:
>>
>>>      executemany("update foo set bar = %s where id = %s",
>>> d.iteritems())
>>
>>> If executemany accepted an iterable, that would work.  But, it only
>>> accepts a sequence, so I need to do:
>>
>>>      executemany("update foo set bar = %s where id = %s", d.items())
>>
>>> which generates a million-item temporary list.  Or am I mis-
>>> understanding the PEP?
>>
>> Interesting, but here's my guess...
>>
>> Replace d.items() with itertools.repeat( ('a', 'b') )
>>
>> So, if you have a sequence, which has a length and known size, at
>> least you can have an attempt at the DB operations: whether the
>> transaction fails or not is another thing...In short, a sequence is
>> finite, while an iterable may be infinite.
>>
>> That's just my guess and makes sense to me at least!
>>
>> Jon.
>
> Actually, thinking about it some more, I would take the following
> approach:
> (this is only loosely do with the Python DB API mind you...)
>
> 1) Start a transaction
> 2) Create a temporary table
> 3) Bulk insert your million update records to the temp table (from my
> understanding of the PEP, executemany(), is allowed to repeatedly call
> execute() unless it can do something "cleverer")
> 4) Execute an update with a from statement joining your main table and
> temp table (pretty sure that's ANSI standard, and DB's should support
> it -- embedded one's may not though, but if you're dealing with 1mil
> records, I'm taking a guess you're not dealing with embedded)
> 5) End the transaction
>
> Far more efficient as a repeated execute of 'update' will only just
> hit the DB once, while an update statement with a from should allow
> the DB a chance to optimise it.
>
> Hope that makes sense, lemme know.

     Right.  In general, if you find yourself making millions of
SQL database requests in a loop, you're doing it wrong.

     Big database loads are usually done by creating a text file
with the desired data, then using a LOAD DATA INFILE command.
This (in MySQL) is tens to hundreds of times faster than
doing individual INSERT or UPDATE commands.

				John Nagle



More information about the Python-list mailing list