PEP 249 (database api) -- executemany() with iterable?

Martin Gregorie martin at address-in-sig.invalid
Thu Oct 14 08:25:02 EDT 2010


On Thu, 14 Oct 2010 16:36:34 +1300, Lawrence D'Oliveiro wrote:

> In message <4cb5e659$0$1650$742ec2ed at news.sonic.net>, John Nagle wrote:
> 
>>      Also note that there are some issues with doing a huge volume of
>> updates in one MySQL InnoDB transaction.  The system has to keep the
>> data needed to undo the updates, and there's a limit on the amount of
>> pending transaction history that can be stored.
> 
> How does “load data” avoid this? Is that not a transaction too?
>
Not usually. Its faster because there's no journalling overhead. The 
loader takes out an exclusive table lock, dumps the data into the table, 
rebuilds indexes and releases the lock. I can't comment about MySQL 
(don't use it) but this has been the case on the RDBMS databases I have 
used.
 
> Seems to me this isn’t going to help, since both old and new tables are
> on the same disk, after all. And it’s the disk access that’s the
> bottleneck.
>
There's a lot of overhead in journalling - much more than in applying 
changes to a table. The before and after images *must* be flushed to disk 
on commit. In UNIX terms fsync() must be called on the journal file(s) 
and this is an expensive operation on all OSes because committing a 
series of small transactions can cause the same disk block to be written 
several times. However, the table pages can safely be left in the DBMS 
cache and flushed as part of normal cache operation since, after a crash, 
the table changes can always be recovered from a journal roll-forward. A 
good DBMS will do that automatically when its restarted.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |



More information about the Python-list mailing list