PEP 249 (database api) -- executemany() with iterable?
nagle at animats.com
Tue Oct 12 21:43:12 CEST 2010
On 10/12/2010 11:35 AM, Jon Clements wrote:
> On 12 Oct, 18:53, Jon Clements<jon... at googlemail.com> wrote:
>> On 12 Oct, 18:32, Roy Smith<r... at panix.com> wrote:
>>> On Oct 12, 1:20 pm, Jon Clements<jon... at googlemail.com> wrote:
>>>> On 12 Oct, 16:10, Roy Smith<r... at panix.com> wrote:
>>>>> PEP 249 says about executemany():
>>>>> Prepare a database operation (query or command) and then
>>>>> execute it against all parameter sequences or mappings
>>>>> found in the sequence seq_of_parameters.
>>>>> are there any plans to update the api to allow an iterable instead of
>>>>> a sequence?
>>>> I'm not understanding (probably me). Example?
>>> I have a dictionary, d, which has a million items in it, and want to
>>> do something like:
>>> executemany("update foo set bar = %s where id = %s",
>>> If executemany accepted an iterable, that would work. But, it only
>>> accepts a sequence, so I need to do:
>>> executemany("update foo set bar = %s where id = %s", d.items())
>>> which generates a million-item temporary list. Or am I mis-
>>> understanding the PEP?
>> Interesting, but here's my guess...
>> Replace d.items() with itertools.repeat( ('a', 'b') )
>> So, if you have a sequence, which has a length and known size, at
>> least you can have an attempt at the DB operations: whether the
>> transaction fails or not is another thing...In short, a sequence is
>> finite, while an iterable may be infinite.
>> That's just my guess and makes sense to me at least!
> Actually, thinking about it some more, I would take the following
> (this is only loosely do with the Python DB API mind you...)
> 1) Start a transaction
> 2) Create a temporary table
> 3) Bulk insert your million update records to the temp table (from my
> understanding of the PEP, executemany(), is allowed to repeatedly call
> execute() unless it can do something "cleverer")
> 4) Execute an update with a from statement joining your main table and
> temp table (pretty sure that's ANSI standard, and DB's should support
> it -- embedded one's may not though, but if you're dealing with 1mil
> records, I'm taking a guess you're not dealing with embedded)
> 5) End the transaction
> Far more efficient as a repeated execute of 'update' will only just
> hit the DB once, while an update statement with a from should allow
> the DB a chance to optimise it.
> Hope that makes sense, lemme know.
Right. In general, if you find yourself making millions of
SQL database requests in a loop, you're doing it wrong.
Big database loads are usually done by creating a text file
with the desired data, then using a LOAD DATA INFILE command.
This (in MySQL) is tens to hundreds of times faster than
doing individual INSERT or UPDATE commands.
More information about the Python-list